S311441 Practical Performance Management for Oracle Real Application Clusters Michael Zoll,...

S311441 Practical Performance Management for Oracle Real Application Clusters

Michael Zoll, Consulting Member of Technical StaffBarb Lundhild, Product Manager, Oracle Real Application Clusters

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

<Insert Picture Here>

Agenda

• Oracle RAC Infrastructure and Technical Fundamentals

• Application and Database Design• Common Problems and Symptoms• Diagnostics and Problem Determination • Appendix

Objective

• Convey a few simple and fundamental concepts of Oracle RAC performance

• Summarize application level performance and scalability information

• Provide some simple sizing hints• Give exemplary overview of common problems and

solutions• Builds on similar presentation from OOW 2008 http://www.oracle.com/technology/products/database/clustering/pdf/s298716_oow2008_perf.pdf


Oracle RAC Infrastructure: Technical Fundamentals, Sizing and Configuration

Service

Oracle RAC Architecture

public network

Node1

Operating System

Oracle Clusterware

instance 1

ASM

VIP1

Listener Node 2

VIP2

Node n

VIPn/…/

Redo / Archive logs all instances

shared storage

Database / Control files

OCR and Voting Disks

Managed by ASM

SCAN_Listener

Service

Operating System

Oracle Clusterware

instance 2

ASM

ListenerSCAN_Listener

Service

Operating System

Oracle Clusterware

instance n

ASM

ListenerSCAN_Listener

Global Cache and Global Enqueue ServiceProcesses and Functions

DictionaryCache

Log buffer

LGWRDBW0

LibraryCache

Buffer Cache

LMONLMD0

SGARuns in Real

Time Priority

LMSx

Cluster Private High Speed Network

Global Resource Directory

Global Enqueue Service Global Cache Service

OracleProcess

OracleProcess

Global Cache Access

Receive

LGWR

1

LMS

2

6SendShadow

process:

: Buffer Cache

Immediate direct send: > 96%

Log write and send : < 4%

Legend:

Flush redo

4

Post3

Post

5

Basic Performance Facts

• Global Cache access is 100 - 500 usecs ( roundtrip )– Data immediately served from remote instances via private, high

speed interconnect • Redo may have to be written to log file before send if data was changed and has

not been committed yet

– Performance varies with network infrastructure and network protocol

• Maximum network hops is 3 messages – For clusters with more than 2 nodes, independent of total cluster

size

• CPU cost per OLTP transaction – Dependent on locality of access , I.E. messages per tx

Basic Performance Facts: Latency (UDP/GbE and RDS/IB )

Block size

RT (ms)

2K 4K 8K 16K

UDP/GE 0.30 0.31 0.36 0.46

RDS/IB 0.12 0.13 0.16 0.20

Lower CPU cost relative to protocols and network infrastructure

•Actual interconnect latency is generally not the problem unless you have exceeded capacity or you are experiencing errors

Private Interconnect

• Network between the nodes of an Oracle RAC cluster MUST be private/dedicated to traffic between Oracle RAC nodes

• Large ( Jumbo ) Frames for GbE recommended– Avoids fragmentation and reassembly ( 8K / 1500 MTU = 6

fragments )

• Interconnect bandwidth should be tested with non-Oracle utilities ( e.g. iPerf )– No packet loss at 75% - 80% of bandwidth utilization

Interconnect Bandwidth

• Generally, 1Gb/sec sufficient for performance and scalability in OLTP.

• DSS/DW systems should be designed with > 1Gb/sec capacity categorically

• Prediction of interconnect traffic is difficult– Depends on transaction instruction length per message – Empirical rule of thumb: 1Gb/sec per 32 CPU Cores

• Infiniband and 10GbE are supported for scale-out


Performance and Scalability of Applications and Database Design with RAC

General Scalability

• Scaling OLTP workloads, DML intensive– Scale well, if contention is little and database/working set size

scales ( I.E. add node when demand grows)

• Read intensive workloads scale predictably and linearly– Bigger cache when adding more nodes– Faster read access to global cache than to disk, less disk IO

• If cluster-size and database size growth are balanced, system will perform and scale well

Performance and Scaling in Application and Database Design

Response Time Impact • Index contention on INSERTS when index is right-

growing– system generated “artificial” keys such as consecutive order

numbers or “natural” keys such as dates

• UPDATES or DELETES to rows in a small working set– Session logging and tracking – First-in first-out queues – State of messages in queues

• Bulk INSERTS of large amounts of data – LOBS

DML Contention and Serialization

…… ……

INSERT INTO I WHERE Key = sequenceUPDATE T SET … WHERE row in blocks[1..n]

and n is a small number

“busy blocks” “busy blocks”

Index I Index I

Table T Table T’

Modification intensive operations on small set of ( cached) blocks

Performance and Scaling in Application and Database Design

CPU Cost due to Inter-Instance Messaging and non-linear scaling

• In-memory databases – Working set spans multiple buffer caches – Frequent modifications and reads of recent modifications

• Working set fits into memory of one instance – Locality of access worsens when node are added and users are

load balanced

• Scale as long as sufficient CPU power is available

Read-intensive

…… ……

Eventually all blocks cached, Larger read cache

ReadRead

Disk Transfer

Cache Transfer

No messages in 11g

Buffer Cache 32GB Buffer Cache 32GB

Working Set on Disk 64GB

Performance and Scalability

• Good linear or near-linear scaling out of box• IO and CPU intensive applications with large

working sets and low proximity of access– Self-Service Web Applications ( Shopping Carts etc. )– CRM– Document storage and retrieval – Business Analytics and Data Warehousing

Performance and Scalability

• Partitioning or load direction may optimize performance• High proximity of access , e.g. adding and removing

from message queues– Advanced Queuing and Workflow

• Batch and bulk processes– Order processing and Inventory– Payroll processing

Identifying Performance and Scaling Bottlenecks in Database Design

The Golden Rules:• #1: For first approximation, disregard read-mostly objects and

focus on the INSERT, UPDATE and DELETE intensive indexes and tablespace

• #2: If DML access to data is random, no worries if CPU is not an issue

• #3: Standard SQL and schema tuning solves > 80% of performance problems. There is usually only a few problem SQL and Tables.

• #4: Almost everything can be scaled out quickly with load-direction and load balancing

Identifying Performance and Scaling Bottlenecks in Database Design

• Look for indexes with right-growing characteristics

– Keys comprising DATE columns or keys generated by sequence numbers

• Find frequent updates of “small” and compact tables– “small”=fits into a single buffer cache

• Identify frequently and concurrently modified LOBs

HOW ?

• Look at segment and SQL statistics in the Automatic Workload Repository

• Use Oracle Enterprise Manager Access Advisories and Automatic Database Diagnostics Monitor (ADDM)

• Instrumentation with MODULE and ACTION helps identify and quantify components of the workload

Quick FixesWithout modifying Application

• Indexes with right-growing characteristics

– Cache sequence numbers per instance – Hash or range partition table with LOCAL indexes

• Frequent updates of “small” and compact tables– Reduce block size ( 2K ) and row density of blocks

(PCTFREE 99 )

• Frequently modified LOBS– Hash partitions ( 128 – 256 )– FREE POOLS

Quick Fixes

• Application Modules which may not scale or cannot be

quickly reorganized can be directed to particular nodes via cluster managed services– For Administrator Managed and older releases create service

with 1 preferred node and the rest available– For Policy Managed databases use a singleton service

• Some large scale and high performance applications may be optimized by Data Partitioning ( range, hash, or composites) and routing per partitioning key in application server tier– E.g. hash by CLIENT_ID, REGION etc.

Leverage Connection Pools UCP: Load Balancing and Affinity

I’m busy

I’m very busy

I’m idle

ApplicationApplication

RAC Database

Instance1

Instance2

Instance3

Pool

30% Work

60% Work

10% Work

Performance and Scalability Enhancements in 11.1 and 11.2

• Read Mostly– Automatic policy detects read and disk IO intensive tables– No interconnect messages when policy kicks in -> CPU savings

• Direct reads for large ( serial and parallel ) scans– No locks , no buffer cache contention– Good when IO subsystem is fast or IO processing is offloaded to

storage caches or servers ( e.g. Exadata )

• Fusion Compression– Reduces message sizes and therefore CPU cost

• Dynamic policies to make trade-off between disk IO and global cache transfers


Performance Diagnostics and Checks: Metrics and Method


Normal Behaviour

• It is normal to see time consumed in – CPU– Db file sequential/scattered read– Direct read– Gc cr/current block 2-way/3-way ( Transfer from remote cache )– Gc cr/current grant 2-way ( Correlates with buffered disk IOs )

• Average latencies should be within baseline parameters • Most problems boil down to CPU, IO, network capacity or

applications issues

Normality, Baselines and Significance

AVG Waits Time(s) (ms) %Time

db file sequential read 2,627,295 21,808 8 43.2 %

CPU time 9,156 18.2

gc current block 3-way 3,289,371 4,019 1 8.0

gc buffer busy acquire 373,777 3,272 9 6.5gc current block 2-way 3,982,284 3,192 1 6.3

20.8%

gc current block busy 125,595 2,931 4

Avg < 1 ms Contention

Most significant response timecomponent

GC waits are influenced by interconnect or remote effects which are not always obvious

Distributed Cause and Effect

Node 1

Node 2

Node 1

Node 2

ROOT CAUSE

Disk CapacityDisk or Controller Bottleneck

Example: Cluster-wide Impact of a Log File IO Problem

Global Metrics View

WORKLOAD REPOSITORY report for

InstanceOOW8

Hostoowdb8

WORKLOAD REPOSITORY report for

InstanceOOW4

Hostoowdb4

Event:gc current block busy 23ms

Log file paralel write 20ms

Global Cache Transfer Stats Inst # Busy % 4 data block 114,426 95.9 24.1

Avg global cache current block flush time (ms): 21 ms

Local Symptom

Remote instance table

Cause

Investigate Serialization

gc buffer busy 9 ms

gc current block busy 4 ms

Waits for

Not OK! Global Cache Transfer Stats

4 data block 114,426 95.9 4.1 7 data bloc 162,630 76.6 23.4

Inst Block Blocks % % No Class Received Immed Busy

Avg global cache current block flush time (ms): 3.7

Log file IO

Example: Segment Statistics

Segments by Global Cache Buffer Busy

Segments by Current Blocks Received ES_BILLING TABLE 85.81 %

ES_BILLING TABLE 97.41 %

ANALYSIS: TABLE ES_BILLING is frequently read and modified on all nodes. The majority of global cacheaccesses and serialization can be attributed to this .

Comprehensive Cluster-wide Analysis via Global ADDM

Courtesy of Cecilia Gervasio, Oracle Server Technologies, Diagnostics and Manageability


Common Problems and Symptoms


Common Problems and Symptoms

• Interconnect or Switch Problems• Slow or bottlenecked disks• High Log file Sync Latency• System load and scheduling

Symptoms of Interconnect Problems

Capacity LimitCongestionDropped Packets ROOT CAUSE

High latencies

Serialization

Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time(s)(ms) Time Wait Class----------------------------------------------------------------------------------------------------

log file sync 286,038 49,872 174 41.7 Commit

gc buffer busy 177,315 29,021 164 24.3 Cluster

gc cr block busy 110,348 5,703 52 4.8 Cluster

gc cr block lost 4,272 4,953 1159 4.1 Cluster

cr request retry 6,316 4,668 739 3.9 Other

Symptoms of an Interconnect Problem

Should never be hereAlways a severe performance problem

Interconnect or IPC problems

NIC1 NIC2

Ports

Queues

Device Drivers

Ifconfig -a

Applications: Oracle

Switch

UDP:Packet receive errorsSocket buffer overflows

Protocol processing:IP,UDP

IP: 1201 Fragments dropped after timeout467 Reassembly failure468 Incoming packets discarded

netstat –s

gc blocks lost

TX errors:135 dropped: overruns:RX errors: 0 dropped:27 overruns:

Cluster-wide Impact of a Database File IO Problem

Node 2

Node 1

ROOT CAUSE

Disk CapacityDisk or Controller BottleneckIO intensive Queries

Cluster-Wide Disk I/O Impact

Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time(s)(ms) Time ------------------------------ ------------ ----------- ------ ------

log file sync 286,038 49,872 174 41.7

gc buffer busy 177,315 29,021 164 24.3

gc cr block busy 110,348 5,703 52 4.8 ``

Load Profile~~~~~~~~~~~~ Per Second

---------------

Redo size: 40,982.21

Logical reads: 81,652.41

Physical reads: 51,193.37

Node 2

Node 1

Expensive Query in Node 2

Causes IO bottleneck

1. IO on disk group containing

redo logs is slow2. Block shipping for frequently

modified blocks is delayed

by log flush IO

3. Serialization builds up

CAUSE:

Log File Sync Latency: Causes and Symptoms

Courtesy of Vinay Srihari, Oracle Server Technologies, Recovery

Causes of High Commit Latency

• Symptom of Slow Log Writes– I/O service time spike may last only seconds or minutes– Threshold-based warning message in LGWR trace file

• “Warning: log write elapsed time xx ms, size xxKB”• Dumped when write latency >= 500ms

– Large log_buffer makes a bad situation worse.

• Fixes – Smooth out log file IO on primary system and standby redo

apply I/O pattern– Primary and Standby storage subsystem should be configured

for peaks– Apply bug fixes in appendix

Courtesy of Vinay Srihari, Oracle Server Technologies, Recovery

Block Server Process Busy or Starved

ROOT CAUSEToo few LMSsLMS not in High PrioMemory Problems ( Swapping)

Node 1

Node 2

Block Server Process Busy or Starved

Top 5 Timed Events Avg %Total

~~~~~~~~~~~~~~~~~~ wait Call

Event Waits Time (s) (ms) Time Wait Class

------------------------------ - ----------- ----------- ------ ------ ----------

gc cr grant congested 26,146 28,761 1100 39.1 Cluster

gc current block congested 13,237 13,703 1035 18.6 Cluster

gc cr grant 2-way 340,281 12,810 38 17.4 Cluster

gc current block 2-way 119,098 4,276 36 5.8 Cluster

gc buffer busy 8,109 3,460 427 4.7 Cluster

“Congested” : LMS could not dequeue messages fast enough

Avg message sent queue time (ms): 16.1On remote note :

Block Server Processes Busy

• Increase # LMS based on– Occurrence of “congested” wait events– Heuristics: 75 – 80 % busy is ok– Avg send q time > 1ms

• Caveat: # of CPUs should always be >= # of LMS to avoid starvation

• On NUMA architectures and CMT– Bind LMS to NUMA board or cores in processor set– Fence off Hardware interrupts from the processor sets

High Latencies in Global Cache

ROOT CAUSE


Transient Problems and Hangs

Temporary Slowness and Hang

• Can affect one or more instances in cluster• Can be related

– IO issues at log switch time ( checkpoint or archiver slow)

– Process stuck waiting for IO– Connection storm

• Hard to establish causality with AWR statistics • Use Oracle Enterprise Manager and Active

Session History

Temporary Cluster Wait Spike

Spike in Global Cache Reponse TimeSQL with High Global Cache Wait Time

Courtesy of Cecilia Gervasio, Oracle Server Technologies, Diagnostics and Manageability

Temporary Slowness or Hang

Slowdown from 5:00-5:30

$ORACLE_HOME/rdbms/admin/ashrpt.sql

Additional Diagnostics

• For all slowdown with high averages in gc wait time– Active Session History report ( all nodes )– Set event 10708 on selected processes:

• Event 10708 trace name context forever, level 7• Collect trace files

– Set event 10899 system-wide• Threshold based , I.E. no cost

– Continuous OS Statistics • Cluster Health Monitor (IPD/OS)

– LMS, LMD, LGWR trace files– DIA0 trace files

• Hang Analysis


Conclusions

Golden Rules For Performance and Scalability in Oracle RAC

• Thorough configuration and testing of infrastructure is basis for stable performance

• Anticipation of application and database bottleneck and their possible magnified impact in Oracle RAC is relatively simple

• Enterprise Manager provides monitoring and quick diagnosis of cluster-wide issues

• Basic intuitive and empirical guidelines to approach performance problems suffice for all practical purposes

AQ&Q U E S T I O N SQ U E S T I O N S

A N S W E R SA N S W E R S

http://otn.oracle.com/rac

Recommended Sessions

DATE and TIME SESSION

Tuesday, October 13 1:00 PMNext Generation Database Grid - Moscone

Sourt 104

Tuesday, October 13 2:30 PMSingle Instance Oracle Real Application

Clusters - Better Virtualization for Databases – Moscone South 300

Wednesday, October 14 11:45 AM Understanding Oracle Real Application Clusters Intenals - Moscone South 104

Thursday, October 15 9:00 AM Oracle ACFS: The Awaited Missing Feature Moscone South 305

Visit us in the Moscone West Demogrounds Booth W-037


Appendix

References

• http://www.oracle.com/technology/products/database/clustering/pdf/s298716_oow2008_perf.pdf

• http://otn.oracle.com/rac

S311441 Practical Performance Management for Oracle Real Application Clusters Michael Zoll,...

Documents

Transcript of S311441 Practical Performance Management for Oracle Real Application Clusters Michael Zoll,...