Best Practices of Oracle 10g/11g Clusterware ......About Author: Kai Yu • Senior System Consultant...

Best Practices of Oracle 10g/11g Clusterware: Configuration, Administration and Troubleshooting

Kai Yu Dell Inc

About Author:Kai Yu

• Senior System Consultant in Dell Oracle Solutions Engineering lab: kai_yu@dell.com, 512-728-0046

• Oracle Database/Applications DBA since 1995• IOUG Collaborate 09 Committee Member and RAC SIG

US Event Chair• Frequent Presenter and Author for IOUG SELECT

Journal, Dell Power Solutions, OOW06/07/08 and Collaborate 08/09, RAC SIG Web seminar

Oracle Clusterware Architecture

Introduction to Oracle Clusterware– Clusterware role: manage cluster resources– Components: OCR, voting disk, Interconnect,

Clusterware processes

Oracle clusterware components– Voting disk stores cluster membership, 3 copies – OCR stores information about clusterware resources ,

multiplexed OCR for high availability– Clusterware processes:

• CRS manages cluster resourcesroot 9204 1 0 06:03 ? 00:00:00 /bin/sh /etc/init.d/init.crsd runroot 10058 9204 0 06:03 ? 00:01:23 /crs/product/11.1.0/crs/bin/crsd.bin reboot• CSSD manages the node membership through

heartbeat and voting disk, notify the membershipstatus changes

root 9198 1 0 06:03 ? 00:00:22 /bin/sh /etc/init.d/init.cssd fatalroot 10095 9198 0 06:03 ? 00:00:00 /bin/sh /etc/init.d/init.cssd oprocdroot 10114 9198 0 06:03 ? 00:00:00 /bin/sh /etc/init.d/init.cssd oclsomonroot 10151 9198 0 06:03 ? 00:00:00 /bin/sh /etc/init.d/init.cssd daemonoracle 10566 10151 0 06:03 ? 00:00:40 /crs/product//11.1.0/crs/bin/ocssd.bin• Event manager(EVM) publishes events through ONS;

communicate between CRS and CSS

Oracle Clusterware Architectureroot 9196 1 0 06:03 ? 00:00:00 /bin/sh /etc/init.d/init.evmd runroot 10059 9196 0 06:03 ? 00:00:00 /bin/su -l oracle -c sh -c 'ulimit -c

unlimited; cd /crs /product/11.1.0/crs/log/kblade2/evmd; exec /crs/product/11.1.0/crs/bin/evmd

oracle 10060 10059 0 06:03 ? 00:00:02 /crs/product/11.1.0/crs/bin/evmd.binOPROCD processor monitor monitors the cluster

. Locked in memory

. Introduced in 10.20.04 , replace hangcheck timer

. Reboot cluster if failedroot 9198 1 0 06:03 ? 00:00:22 /bin/sh /etc/init.d/init.cssd fatalroot 10095 9198 0 06:03 ? 00:00:00 /bin/sh /etc/init.d/init.cssd oprocdroot 10465 10095 0 06:03 ? 00:00:00 /crs/product/11.1.0/crs/bin/oprocd run -t 1000 -m 500 -f . Other processes:

RACG: oracle 12039 1 0 06:06 ? 00:00:00 /opt/oracle/product/11.1.0/asm/bin/racgimon daemon ora.kblade2.ASM2.asmoracle 12125 1 0 06:06 ? 00:00:06 /opt/oracle/product/11.1.0/db_1/bin/racgimon startdtest1db

ONS: Oracle Notification Services oracle 12063 1 0 06:06 ? 00:00:00 /crs/oracle/product/11.1.0/crs/opmn/bin/ons -doracle 12064 12063 0 06:06 ? 00:00:00 /crs/oracle/product/11.1.0/crs/opmn/bin/ons -d

Hardware Configuration of Oracle Clusterware– Servers, shared storage, interconnect – Two interconnect switches for redundant interconnects– Butterfly connections to shared storage:

Servers <-> IO Switches <->SAN storage

Shared Storage ConfigurationStorage Requirement:

– Shared storage for OCR and voting disk– Types: block devices, RAW devices, OCFS/OCFS2, NFS

on certified NAS (Oracle Storage Compatibility Program list)

– HA requirement for the shared storagePhysical connections to shared SAN storage

– Fully Redundant active-active IO paths: for HA and IO Load balancing

FC storage: dual HBAs and dual FC switches: each server has two independent paths to both storage processors

Shared Storage Configuration

Fully redundant IO paths for EqualLogic iSCSI storageMultiple NIC cards of each server; two Gigabit Ethernetswitches. On each storage control module, one networkinterface connects to one switch and other two networkinterfaces connects to other switch

Shared Storage ConfigurationMultipath Devices of the Shared Storage

– Multipathing device driver to combine multiple IO paths– Linux native Device Mapper (DM) , or storage vendor

such as EMC PowerPath driver– Example of configuring Linux Device Mapper (DM) • Verify : rpm –qa | grep device-mapper• Find the unique SCSI ID of the device:

$/sbin/scsi_id -gus /block/sdb36090a028e093fc906099540639aa2149$/sbin/scsi_id -gus /block/sde36090a028e093fc90609954063g9aa2149

• Configure multipathing in /etc/multipath.confmultipath {wwid 36090a028e093fc906099540639aa2149 #<---- for sdb and sde

alias votingdisk1}……

• service multipathd restart

Shared Storage Configuration• Verify the multipath devices: multipath -ll

ls -lt /dev/mapper/* brw-rw---- 1 root disk 253, 8 Feb 18 02:02 /dev/mapper/votingdisk1

– EMC PowerPath driver for EMC storage:• Install EMC PowerPath and Naviagent software:

rpm –ivh EMCpower.LINUX-5.1.2.00.00-021.rhel5.x86_64.rpmrpm –ivh naviagentcli-6.24.2.5.0-1.noarch.rpm• Start naviagent agent and PowerPath daemons:

service naviagent start , service PowerPath start• Verify EMC pseudo devices in /proc/partitions:

120 32 419430400 emcpowerc• powermt display dev=emcpowerc

Pseudo name=emcpowerc==============================================================================---------------- Host --------------- - Stor - -- I/O Path - -- Stats ---### HW Path I/O Paths Interf. Mode State Q-IOs Errors==============================================================================

2 lpfc sdc SP B1 active alive 0 02 lpfc sdh SP A0 active alive 0 0

Shared Storage ConfigurationBlock Devices vs Raw Devices

– For RHEL4: use raw devices for 10g, block devices for 11g– For RHEL 5: raw devices are depreciated:• 11g clusterware: use block devices:

set the proper ownerships and permissions in /etc/rc.local file chown root:oinstall /dev/mapper/ocr*

chmod 0640 /dev/mapper/ocr*chown oracle:oinstall /dev/mapper/voting*chmod 0640 /dev/mapper/voting*

• Two options for 10g RACa. Use 11g clusterware for 10g RACb. Map block devices to raw devices by udev rules:

/etc/udev/rules.d/65-rasw.rules:ACTION=="add", KERNEL=="emcpowera1", RUN+="/bin/raw /dev/raw/raw1 %N"

/etc/udev/rules.d/89-raw_permissions.rules:KERNEL=="raw1",OWNER="root", GROUP="oinstall", MODE="640"

c. Start udev: /sbin/udev

Network ConfigurationPublic IP and Virtual IP configuration

– Virtual IP for fast database connection failover• Avoid a possible 10 minutes TCP/IP timeout • Automatically failover to other node

$srvctl status nodeapps -n kblade1VIP is running on node: kblade1GSD is running on node: kblade1Listener is running on node: kblade1

ONS daemon is running on node: kblade1When kblade1 fails, kblade1-vip is failed over to kblade2:$ srvctl status nodeapps -n kblade1VIP is running on node: kblade2GSD is not running on node: kblade1Listener is not running on node: kblade1ONS daemon is not running on node: kblade1

$ ping kblade1-vipPING 155.16.9.171 (155.16.9.171) 56(84) bytes of data.From 155.16.0.1 icmp_seq=9 Destination Host UnreachableFrom 155.16.0.1 icmp_seq=9 Destination Host Unreachable

….. (waiting for 2 seconds before being failed over)64 bytes from 155.16.9.171: icmp_seq=32 ttl=64 time=2257 ms

64 bytes from 155.16.9.171: icmp_seq=33 ttl=64 time=1258 ms

Network ConfigurationPrivate Interconnection Configuration

– Fully Redundant Ethernet Interconnects:Two NIC cards, two non-roundtable interconnect switches

– NIC teaming to bond two network interfaces for failoverifcfg-eth1: ifcfg-eth2: ifcfg-bond0:DEVICE=eth1 DEVICE=eth2 USERCTL=no USERCTL=no IPADDR=192.168.9.52ONBOOT=yes ONBOOT=yes NETMASK=255.255.255.0MASTER=bond0 MASTER=bond0 ONBOOT=yesSLAVE=yes SLAVE=yes BOOTPROTO=noneBOOTPROTO=none BOOTPROTO=none USERCTL=noTYPE=Ethernet TYPE=EthernetAdd: alias bond0 bonding

options bonding miimon=100 mode=1 in /etc/modproble.conf

Network ConfigurationPrivate Interconnection Configuration

– Configuration best practices from Oracle. Refer to [7]• Set UDP send/receive buffers to max• Use the same interconnect for both Oracle clusterware and

Oracle RAC communication• NIC settings for interconnect:

1. Define control : rx=on, tx=off2. Ensure NIC names/slots order identical on all nodes:3. Configure interconnect NICs on fastest PCI bus4. Set Jump frame MTU=9000 in ifcfg-eth1/ifcfg-eth2 files, same

setting for switches• Recommended Linux kernel configuration for networking:

11gR1 10gR2net.core.rmem_default = 4194304 262144 net.core.rmem_max = 4194304 262144 net.core.wmem_default = 262144 262144net.core.wmem_max = 262144 262144

• Network Heartbeat Misscount: 60 secs for 10g, 30 secs for 11g• Hangcheck timer value:

modprobe hangcheck-timer hangcheck_tick=1 hangcheck_margin=10 hangcheck_reboot=1

Managing Oracle clusterware

Managing voting disk– Locate voting disk: crsctl query votedisk css– Backup/restore using dd command for example:

dd if=/dev/mapper/votingdisk1p1 of=/backup/vd bs=4096dd if=/backup/vd of=/dev/mapper/votingdisk1p1 bs=4096

– Adding/removing voting disk:crsctl add css votedisk /dev/mapper/votingdisk3p1 –forcecrsctl delete css votedisk /dev/mapper/votingdisk2p1 –force

Managing OCR– Three tools: OCRCONFIG, OCRDUMP and OCRCHECK– Add mirror: ocrconfig –replace ocrmirror /dev/mapper/ocr2p1– replace OCR: ocrconfig –replace ocr /u01/ocr/ocr1 – OCR backup: Clusterware automatically backs up OCR

show backup: ocrconfig –showbackup– Manual backup: ocrconfig –manualbackup– OCR export/import:

ocrconfig –export /home/oracle/ocr_exportocrconfig –import /home/oracle/ocr_export

cluvfy comp ocr

Extending Cluster by Cloning Oracle Clusterware– Task:

• Existing cluster: k52950-3-n1 and k52950-3-n2 • Add new node: k52950-3-n3

– Step 1: Pre-requisite tasks on new node k52950-3-n3 :OS install, share storage for OCR and voting diskNetwork configuration: public, private and VIP

– Step 2: •Create CRS home backup on the source node k52950-3-n1•Copy CRS home backup to the new node • Shutdown CRS in the source node• Remove all the log files and trace files from the backup• Create oracle inventory in the new node• Set ownership for oracle inventory• Run preupdate.sh in the new node.

– Step3: Run CRS clone process in the new node:

Managing Oracle clusterwareExecute ./root.sh as root on new node k52950-3-n3 as instructedStep 4: Run addNode.sh on the source node k52950-3-n1:

Start CRS on node1 k52950-3-n1 and execute rootaddnode.sh on the source node k52950-3-n1

Execute root.sh on the new node:

•Restart CRS on node 2: k52950-3-n2:[root@k52950-3-n2 bin]# ./crsctl start crs

Clusterware Troubleshooting

Split Brain Condition and IO Fencing Mechanism• Split brain condition: a node failure partitions the cluster into multiple sub-clusters without knowledge of the existence of others.• Consequence: data collision and corruption• IO fencing: fencing the failed node off from all the IOs: STOMITH• Node eviction: pick a cluster node as victim to reboot.

Always keep the largest cluster possible up, evicted other nodestwo nodes: keep node 1 up and evict node

• Two CSS heartbeats and misscounts to detect node eviction• Network heartbeat (NHB) : cross the private interconnect

establish and confirm valid node membership CSS misscount:

the maximal # of secondscomplete heartbeatwithout trigging a node eviction

• Disk heartbeat : between the cluster node and voting disk CSS misscount, the default is 200 seconds for 10.2.0.1 and up

Node Eviction Diagnosis Case Study – Node 7 evicted in a 11-node 10g cluster on Linux:

/var/log/messages: Jul 23 11:15:23 racdb7 logger: Oracle clsomon failed with fatal status 12.Jul 23 11:15:23 racdb7 logger: Oracle CSSD failure 134.Jul 23 11:15:23 racdb7 logger: Oracle CRS failure. Rebooting for cluster integrity

OCSSD log: $CRS_HOME/log/<hostname>/cssd/ocssd.log file[ CSSD]2008-07-23 11:14:49.150 [1199618400] >WARNING:clssnmPollingThread: node racdb7 (7) at 50% heartbeat fatal, eviction in 29.720 seconds..clssnmPollingThread: node racdb7 (7) at 90% heartbeat fatal, eviction in 0.550 seconds…[ CSSD]2008-07-23 11:15:19.079 [1220598112] >TRACE:

clssnmDoSyncUpdate: Terminating node 7, racdb7, misstime(60200) state(3)

– Root cause analysis: • A network heartbeat failure triggered a node eviction on node 7• Private IP node not pingable right before the node eviction• Public and private shared a single physical switch– Solution : Use two dedicated switches for interconnect– Result: no more node eviction after the switch change

CRS Reboots Troubleshooting Procedure – OCSSD, OPROCD and OCLSOMON monitoring processes

• Detect certain conditions of impacting data integrity • Kill clusterware and trigger CRS reboot • Leave critical errors in the related log files

– Troubleshooting Methods:• Review syslog file /var/log/messages in Linux • OCSSD log ocssd.log node eviction due internal health

error, interconnect and membership error.• PROCD log : /etc/oracle/oprocd/<host>.oprocd.log node

eviction due to hardware and driver freeze • OCLSOMON log oclsmon.log for rebooting due to

hangs/scheduling issues.

– Troubleshooting Flowchart

RAC Diagnostic Tools:– Diagwait:

• Delay the node reboot for a short time to write all diagnostic messages to the logs.

• Doesn’t increase of probability of data corruption• Setup steps: shutdown CRS

crsctl set css diagwait 13 –forcerestart CRS

– Oracle problem detections tool(IPD/OS)• Monitor and record resources degradation and failure

related to oracle clusterware and Oracle RAC issue• Historical mode goes back to the time before node eviction• Linux x86 kernel 2.6.9 up.

– RAC-RDDT and Oswatcher• Collect information leading up to the time of reboot.• From OS utilities: netstat, iostat, vmstat• Metalink #301138.1, #301137.1

References

[1] Oracle 10g Grid Real Applications Clusters Oracle 10g Grid Computing with RAC, Mike Ault, Madhu Tumma

[2] Oracle Metalink Note: # 294430.1, CSS Timeout Computation in Oracle Clusterware

[3] Oracle Metalink Note: # 265769.1, Troubleshooting CRS Reboots[4] Oracle Clusterware Administration and Deployment Guide 11g Release 1

(11.1) September 2007[5] Deploying Oracle Database 11g R1 Enterprise Edition Real Application

Clusters with Red Hat Enterprise Linux 5.1 and Oracle Enterprise Linux 5.1 On Dell PowerEdge Servers, Dell/EMC Storage, Kai Yu

http://www.dell.com/downloads/global/solutions/11gr1_ee_rac_on_rhel5_1__and_OEL.pdf?c=us&cs=555&l=en&s=biz[6] Dell | Oracle Supported configurations: Oracle Enterprise Linux 5.2

Oracle 11g Enterprise Edition Deployment Guide: http://www.dell.com/content/topics/global.aspx/alliances/en/oracle_11g_oracle_ent_linux_4_1?c=us&cs=555&l=en&s=biz[7] Oracle Real Applications Clusters Internals, Oracle Openworld 2008

presentation #298713, Barb Lundhild[8] Looking Under the hood at Oracle Clusterware, Oracle Openworld 2008

presentation #299963, Murali Vallath

Best Practices of Oracle 10g/11g Clusterware ......About Author: Kai Yu • Senior System Consultant...

Documents

Transcript of Best Practices of Oracle 10g/11g Clusterware ......About Author: Kai Yu • Senior System Consultant...

Oracle clusterware 11gR2 - WordPress.com · Oracle clusterware 11gR2 UKOUG TEBS 2010 Frits Hoogland ... security –Technical)security,)performance Blog: h6p ... –Oracle clusterware

Oracle Clusterware 18c Technical Overvie · 1 | Oracle Clusterware 18c - Technical Overview Introduction Oracle Clusterware enables the clustering of otherwise independent servers

Oracle Clusterware 11g

Using Oracle Clusterware to Protect 3rd Party Applications€¦ · Using Oracle Clusterware to Protect 3rd Party Applications Page 6 RULES FOR CLUSTERS USING ORACLE CLUSTERWARE Oracle

Oracle Clusterware 12.1.0 - WordPress.com · Oracle VM and Bare Metal ... dispersion and hard, weak, pull Bare Metal and Oracle VM servers Oracle ... Oracle Clusterware introduced

Oracle Clusterware and Oracle Real Application Clusters(10.2)_b14197

Oracle Clusterware for Sysadmins - DOAG Deutsche ORACLE ... · OCR / Voting disk placement and protection • Oracle Clusterware files include voting disks, ... 259301.1 CRS and 10g

Oracle Clusterware Node Management and Voting Disks

Twp Oracle clusterware 3rd party[1]

Oracle Clusterware 11g Release 2

Oracle Clusterware 11GR2 - COUG

Using Oracle Clusterware to protect Oracle Application Server

High Availability Overview - Oracle...Oracle Real Application Clusters and Oracle Clusterware 3-19 Benefits of Using Oracle Clusterware 3-20 Benefits of Using Oracle Real Application

Administration and Deployment Guide - … in This Release for Oracle Clusterware Administration and ... Changes in Oracle Clusterware 12c Release 2 ... 2.4 Overview of Oracle Database

Oracle GoldenGate High Availability Using Oracle Clusterware

Oracle Clusterware and Private Network Considerationsglobleat.com/.../07/Oracle...and-Private-Network-Considerations.pdf · Oracle Clusterware and Private Network Considerations ...

Oracle Database Oracle Clusterware and Oracle Real ... · Oracle Database Oracle Clusterware and Oracle Real Application Clusters Installation Guide, 10g Release 2 (10.2) for Microsoft

Oracle 11.2 Database Upgrade: Prepare & Plan · Oracle Clusterware Upgrade 11. g. and 11.2 • Always upgrade Oracle Clusterware first!!! ... • Oracle 10.2.0.3: TZ V3 • Oracle

Oracle Database 11g RAC Database and Oracle Clusterware Administration

Managing Oracle Enterprise Manager Cloud Control 12c with Oracle Clusterware