Proposalsamplefromcapgemini 13316039760571 Phpapp02 120312211734 Phpapp02
2012373nelsonpptnew-120427121545-phpapp02
-
Upload
thota-mahesh -
Category
Documents
-
view
11 -
download
2
description
Transcript of 2012373nelsonpptnew-120427121545-phpapp02
Leighton L. NelsonOracle DBA Team Lead (10 yrs experience, 6 years with RAC)RAC SIG US Events Chair and IOUG Liaison
Session# 373
Looking at RAC, GI/Clusterware Diagnostic Tools
Clusterware & RAC is Complex!
Where do I begin?
Clusterware, ASM & RAC Diagnostics
•Diagcollection
•Cluster Verification Utility (cluvfy)
•Cluster Health Monitor (CHM)
•Remote Diagnostics Agent (RDA)
•ADRCI/Support Workbench
•OS Utilities
Diagcollection
• Gathers and packages Clusterware logs, traces plus OS logs and core files*
• $ORA_CRS_HOME/bin/diagcollection.pl --collect --crshome $ORA_CRS_HOME (10gR2)
• $GRID_HOME/bin/diagcollection.pl --collect --core|crs|all (11gR2)
• Logs can be filtered by date/time with --adr --beforetime --aftertime
• Allocate enough space in current directory for diagnostic files• Needs to be run on all nodes in the cluster.• Limited information collected if not run as root• In 11.2 diagcollection enhanced to collect ADR and CHM data
diagcollection example[root@oelgrid02 u02]# /u01/app/11.2.0/grid/bin/diagcollection.sh --collect
Production Copyright 2004, 2010, Oracle. All rights reserved
Cluster Ready Services (CRS) diagnostic collection tool
The following CRS diagnostic archives will be created in the local directory:
crsData_oelgrid02_20120225_1723.tar.gz -> logs, traces and cores from CRS home. Note: core files will be packaged only with the --core option.
ocrData_oelgrid02_20120225_1723.tar.gz -> ocrdump, ocrcheck etc
coreData_oelgrid02_20120225_1723.tar.gz -> contents of CRS core files in text format
osData_oelgrid02_20120225_1723.tar.gz -> logs from Operating System
Collecting crs data
Cluster Verification Utility
• Cluvfy runs in stage mode or component mode
• Can be executed from the Grid Infrastructure Home in 11gR2 or from installation media
• New resource in 11.2.0.2.0 - ora.cvu
• “cluvfy comp –list” displays components that can be checked
• For standalone cluvfy set CV_HOME CV_JDKHOME and CV_DESTLOC
Cluster Verification Utility
•Use stage mode during installation/upgrade•Use component mode to diagnose components after Clusterware
installation•Doesn’t diagnose all components e.g. HAIP•$GRID_HOME/bin/cluvfy•$INSTALL_DISK/runcluvfy.sh
•New in 11.2.0.3.0 :
cluvfy comp healthcheck
Cluster Verification Utility
cluvfy comp –list output
Cluster Health Monitor (CHM)
• Cluster Health Monitor (CHM) monitors and collect OS and clusterware metrics in real-time
• Installed by default in 11.2.0.2+
• Collects metrics at 1 sec interval in 11.2.0.2 and 5 sec interval in 11.2.0.3
• Command Line Interface $GRID_HOME/bin/oclumon
• Collects CHM data using diagcollection.pl --collect --chmos
Cluster Health Monitor (CHM)
• Useful for troubleshooting root cause analysis - node reboots/hangs, instance evictions, performance degradations etc
• OTN version of CHM and 11.2.0.2 version are incompatible. If you have 11.2.0.2 then you cannot install OTN version.
• Uses OS API to collect metrics reducing overhead• Clusterware resource called ora.crf• CHM doesn’t require RAC or Clusterware
OS Watcher Black Box
• OS Watcher v4.0 has been renamed to OS Watcher Black Box (OSWbb)
• UNIX shell scripts for monitoring the OS (ps, top, mpstat, iostat, netstat, vmstat)
• Useful for diagnosing OS resource and performance problems, node reboots
• Should run on all nodes in a cluster
• Setup private interconnect monitoring
• Execute startOSWbb.sh arg1 arg2 where arg1=collection frequency and arg2=retention time
nohup ./startOSWbb.sh 60 48 &
OS Watcher Black Box
• Bundled with OS Watcher Black Box Analyzer (OSWbba)
• Requires Java 1.4.2 or greater
• Correlate OS statistics using the analyzer profile
• Generates graphs and reports for memory, cpu, disk
• Use CLI option to script profile generation for troubleshooting
OS Watcher Black Box
OS Watcher Black BoxOSWbb Free Memory Graph
RACcheck – RAC Configuration Audit Tool
• RACCHECK OUTPUT
RACcheck – RAC Configuration Audit Tool
• Assess the configuration of RAC, Clusterware and ASM
• Useful for pre-upgrade and post-upgrade system verification
• Uses “Best Practices” to report configuration problems – PASS/WARNING/FAIL/INFO
• Generates detailed and summary reports with scorecard
Remote Diagnostics Assistant
• The diagnostics tool recommended by MOS
• Collects a wealth of information based on configuration – OS/Clusterware/Database logs
• Runs AWR/Statspack report for Performance problems
• Generates reports in HTML format
Procwatcher
•Debug Oracle & Clusterware processes using oradebug short_stack or OS debugger (e.g. gdb, pstack)
•Run as Oracle process owner to debug database or as root for clusterware processes
•Can be deployed as a Clusterware resource
•Useful for troubleshooting session hangs, severe performance problems, instance evictions
Procwatchergrid@node1[+ASM1]-/u02 >./prw.sh start all
Wed Feb 25 02:30:26 CDT 2012: Starting Procwatcher
Wed Feb 25 02:30:26 CDT 2012: Thank you for using Procwatcher. :-)
Wed Feb 25 02:30:26 CDT 2012: Please add a comment to Oracle Support Note 459694.1
Wed Feb 25 02:30:26 CDT 2012: if you have any comments, suggestions, or issues with this tool.
Wed Feb 25 02:30:26 CDT 2012: Started Procwatcher
ADRCI/Support Workbench
• Automatic Diagnostic Repository (ADR) stores database diagnostic information
• Package diagnostics files using ADRCI or Support Workbench
• Manages incidents and problems from alert logs
• Enterprise Manager provides GUI interface to ADR called Support Workbench
ADRCI/Support Workbench
RACDIAG.SQL
• Gathers debug information for RAC Session Hangs
• One-time data capture
• Performs hanganalyze dumps
• Certain types of hangs will prevent it from running
OS Utilities
• truss/strace – trace system calls and signals
•pstack – dump stack trace for process
•pmap/procmap – maps process memory
•nmon/nmon analyzer – collects and analyzes OS stats
• collectl /collectl utils – collects and analyzes OS stats
SummaryTool/Utility Instance
EvictionsNode reboots Clusterware
ProblemsRAC Performance
diagcollection ✓ ✓ ✓ ✗
cluvfy ✗ ✗ ✓ ✗
CHM ✓ ✓ ✓ ✓
OSWbb/OSWbba
✓ ✓ ✓ ✓
RDA ✓ ✓ ✓ ✓
RACcheck ✓ ✓ ✓ ✗
Procwatcher ✓ ✗ ✓ ✓
ADRCI/SW ✗ ✗ ✗ ✓
MOS Notes
• OS Watcher Black Box User Guide [ID 301137.1]
• OS Watcher Black Box Analyzer User Guide [ID 461053.1]
• Data Gathering for Troubleshooting Oracle Clusterware (CRS or GI) Issues [ID 289690.1]
• CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide [ID 330358.1]
• Diagnosability for Oracle Clusterware (CRS or Grid Infrastructure) Component and Resource [ID 357808.1]
• Data Gathering for Troubleshooting RAC Issues [ID 556679.1]
• Cluster Health Monitor (CHM) FAQ [ID 1328466.1]
• Introducing Cluster Health Monitor (IPD/OS) [ID 736752.1]
• RACcheck - RAC Configuration Audit Tool [ID 1268927.1]
• Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware Processes [ID 459694.1]
• Script to Collect RAC Diagnostic Information (racdiag.sql) [ID 135714.1]
Contact Information
•Website - blogs.griddba.com
•LinkedIn – Leighton Nelson
•Twitter - @leight0nn
•Email: [email protected]