Angeline Janet Dhanarani - Oracle · 44 Intel’s DB Monitoring Profile & Challenges Intel’s...
Transcript of Angeline Janet Dhanarani - Oracle · 44 Intel’s DB Monitoring Profile & Challenges Intel’s...
Angeline Janet Dhanarani Senior Product Manager Oracle Enterprise Manager October 29, 2015 Deepen Chakraborty
Enterprise Architect
Technology Manufacturing Group
Intel Corporation
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Practical Tips for Oracle Enterprise Manager High Availability and Diagnostics
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Total Cloud Control
Optimized, Efficient | |
Integrated Cloud Stack Management
Agile, Automated
Complete Cloud Lifecycle Management
Scalable, Secure
Superior Enterprise-Grade Management
3
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Next Release Builds on a Solid Foundation
Optimized, Efficient | |
Integrated Cloud Stack Management
Agile, Automated
Complete Cloud Lifecycle Management
Scalable, Secure
Superior Enterprise-Grade Management
NEW: Continuous Monitoring
NEW: Infrastructure Management
NEW: Improved
Hybrid Cloud Management
4
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
5
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Best Practices on High Availability and Disaster Recovery
Monitoring and Diagnostics
1
2
6
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Best practices on High Availability and Disaster Recovery
7
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Firewall
>EMCLI
>EMCLI
Software Library
Oracle Management Repository(OMR)
HTTP(S)
JDBC
Enterprise Manager Architecture Overview
Agent Plug-ins Targets
EMCLI
Console
Connectors BI Publisher
Oracle Store/MOS Notifications
Oracle Management Service(OMS)
8
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
• Level1 : – Very minimal
failure protection
– Low implementation cost
– Recommended for development/test instance
• Level2: – Offers protection
against OMS node failure
– Does not offer scalability
– Automatic failure detection and automated failover of OMS node
9
• Level3: – Offers component level
failure protection
– Offers scalability
– No site level failure protection
• Level4: – Disaster Recovery
– Maximum protection (even site failures)
– Minimized downtime
– Higher Cost for Setup
– Recommended for 24* 7 production instance
High Availability Levels in Enterprise Manager
>EMCLI
OMR
OMS
Users
Active OMS Passive OMS
NFS
OMR OMR(Local Standby)
>EMCLI
Users Load Balancer
OMR with RAC
Active OMS
OMR (Local Standby)
Active OMS
Active Site
Passive Site
Global Load Balancer /DNS >EMCLI
Users
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
High Availability Best Practises for Enterprise Manager
High Availability Levels
Configure Backups of OMS configurations –emctl exportconfig
Configure Backups - Emkey, Repository, Software Library
Configure RAC for Repository High Availability per site
Configure Additional OMSes per site
Configure Highly Available Storage
Configure Dataguard for data replication
Configure Continuous Storage Replication
Level1
Level2
Level3
Level4
10
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Disaster Recovery Solution for Enterprise Manager
• Storage Replication is the only solution supported for Disaster Recovery
• Standby Weblogic Domain is no longer supported from EM 13c
– Not supported by FMW 12.1.3 (used in Enterprise Manager 13c)
– Deprecation notice given in EM 12.1.0.3
• Migration from Standby Weblogic domain to Storage replication topology – Contact Oracle Support for assistance
• Two supported methodologies for Storage Replication
– NAS Storage Replication
– ACFS Server Replication ( Certification in-progress)
11
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Disaster Recovery Solution Disaster Recovery Solution Configuration with NAS Storage Replication
DNS Lookup
Server Load Balancer of Primary data center
EM Repository
Server Load Balancer of Standby data center
OMS Share
OMS1 Share
Swlib Share
BIP Share
DB Replication with Dataguard from Primary to Standby
ACTIVE PASSIVE
Storage Continuous Replication
Storage
EM Repository
Storage Primary
OMS Additional
OMS1
Physical Standby
OMS Share
OMS1 Share
Swlib Share
BIP Share
12
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Disaster Recovery Solution Disaster Recovery Solution Configuration with ACFS Replication
DNS Lookup
Server Load Balancer of Primary data center
ACFS Volumes
Server Load Balancer of Standby data center
DB Replication with Dataguard from Primary to Standby
ACTIVE PASSIVE
ACFS Replication EM
Repository
Primary OMS
Additional OMS1
OMS
OMS1
Swlib
BIP
ASM Storage
HANFS Configuration
ASM Storage
HANFS Configuration
ACFS Volumes
OMS
OMS1
Swlib
BIP
EM Repository
13
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
High Availability Solution for Enterprise Manager on Windows Platform
• Cold Failover Cluster (Active/Passive) solution for OMS and Agent on Windows
– Certified on Windows 2012R2
Microsoft Windows Server Failover Clustering
14
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Monitoring and Diagnostics
15
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
• Console Users
– Slow access to console
– Slow login
– Slow response
– Intermittent Connectivity Issue
16
• Target Administrators ( DBAs)
– Delay in notification alerts
– Alerts not triggered
– Events not triggered
• Patching Administrators
– Patch application very slow
– Blacking out takes more time
Enterprise Manager Health Monitoring
• EM Administrators
– Slow Job executions
– Jobs failed /suspended
– OMS crashes /restarts
– Availability Status pending computation
Early Warning Symptoms
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Health Monitoring
•
•
Monitor-the-monitor (Self monitoring features) is the starting point of this investigation
Has Metric data or Event data inflow increased ?
17
Diagnostic Workflow
Is your system capacity sized appropriately ?
Follow the Sizing Recommendation for your
configuration –Small/Medium/Large
Analyze what has changed More targets ? More metrics ?
Has the inflow of load increased in Enterprise
Manager ?
Would you like to optimize performance ?
Yes
No/Not Sure
Yes
Yes
No
Identify the sub-system and introduce RAC services
Yes
Identify the target/metric and turn off the collection.
Set appropriate alert threshold.
Do you want the
changes ?
Identify the sub-system needing attention
Yes
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Health Monitoring
•
•
Monitor-the-monitor (Self monitoring features) is the starting point of this investigation
Has Metric data or Event data inflow increased ?
18
Diagnostic Workflow
Is your system capacity sized appropriately ?
Follow the Sizing Recommendation for your
configuration –Small/Medium/Large
Analyze what has changed More targets ? More metrics ?
Has the inflow of load increased in Enterprise
Manager ?
Would you like to optimize performance ?
Yes
No/Not Sure
Yes
Yes
No
Identify the sub-system and introduce RAC services
Yes
Identify the target/metric and turn off the collection.
Set appropriate alert threshold.
Do you want the
changes ?
Identify the sub-system needing attention
Yes
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Is your system capacity sized appropriately ?
• Refer Sizing Recommendations
– Ensure Java Heap Size settings are optimal for OMS
– Ensure Hardware requirements are optimal • Memory and Storage
– Ensure database parameters settings are compliant for your configuration
– Ensure network latency between OMS and repository is less than 1ms • Agent to OMS communication
can tolerate huge latencies
19
Setup -> Manage Cloud Control >Repository
Heap Size Parameters OMS_HEAP_MIN OMS_HEAP_MAX OMS_PERMGEN_MIN OMS_PERMGEN_MAX
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Health Monitoring
•
•
Monitor-the-monitor (Self monitoring features) is the starting point of this investigation
Has Metric data or Event data inflow increased ?
20
Diagnostic Workflow
Is your system capacity sized appropriately ?
Follow the Sizing Recommendation for your
configuration –Small/Medium/Large
Analyze what has changed More targets ? More metrics ?
Has the inflow of load increased in Enterprise
Manager ?
Would you like to optimize performance ?
Yes
No / Not sure
Yes
Yes
No
Identify the sub-system and introduce RAC services
Yes
Identify the target/metric and turn off the collection.
Set appropriate alert threshold.
Do you want the
changes ?
Identify the sub-system needing attention
Yes
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Does the Enterprise Manager Repository need attention ?
• Check out Top Wait Events metrics for Enterprise Manager Repository
– Log File Sync wait event should not appear in the top 5 waits • Correlate with AWR report findings to see if the impact is from
specific sub-system.
• Could also indicate increase in load inflow.
– DB CPU should be topmost wait event
– Expect to see I/O related wait events but only Write • Check the SGA Buffer Cache Size if there read wait events in top 5
– For an Ideal repository host :single instance or RAC cluster –No more than 40% CPU utilization on average across all nodes
in RAC
Ideal Values for Log File Sync Wait Time Engineered System ~1 millisec Non-engineered System ~5 millisec or less
12.1.0.5 Onwards
21
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Does the Job sub-system need attention
• Monitor EM Jobs Service performance charts
– Look for Growing backlog in Jobs Steps Scheduled at repository
– Look for consistently High Job Dispatcher processing time (%) but Low throughput. Indicates processing bottleneck.
– Consistently growing, ensure job pool threads are sized per Sizing recommendation
– Correlate with cluster Wait times in AWR report.
22
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Does the Loader sub-system need attention ?
• Loader throughput indicating processing time
– Look for consistent drastic increase over a time period
• Agent backlog
– Check for consistent increase in Back-off requests / Backlogs
• Correlate with Loader Statistics Report
– Check SLB configuration ( Recommended: Round –robin algorithm)
– Increase the loader pool size uniformly across OMS • If there is consistent backoffs in range of hundreds per hour
Setup -> Manage Cloud Control >Health Overview
23
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Health Monitoring
•
•
Monitor-the-monitor (Self monitoring features) is the starting point of this investigation
Has Metric data or Event data inflow increased ?
24
Diagnostic Workflow
Is your system capacity sized appropriately ?
Follow the Sizing Recommendation for your
configuration –Small/Medium/Large
Analyze what has changed More targets ? More metrics ?
Has the inflow of load increased in Enterprise
Manager ?
Would you like to optimize performance ?
Yes
No/ Not Sure
Yes
Yes
No
Identify the sub-system and introduce RAC services
Yes
Identify the target/metric and turn off the collection.
Set appropriate alert threshold.
Do you want the
changes ?
Identify the sub-system needing attention
Yes
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Has the inflow of load increased on Enterprise Manager ?
• Crucial for well-tuned Enterprise Manager
– Incoming Metric Data
– Incoming Event Data
• Pro-active measure to check increase in count of targets and agents – Repository-side Metric Extension to alert if number of targets and agents increases
above the recommended sizing guidelines for the installed configuration • Set alert threshold to the installed configuration
– EVAL / SMALL / MEDIUM / LARGE
• Refer Appendix for configuration
– Re-evaluate Sizing configuration when alerted
25
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Key Indicators of increasing inflow of Metric Data
Uncontrolled metric data generation causes significant performance degradation of Enterprise Manager
26
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Fine tuning incoming metric data
• Best practice recommendation : Collect only what you need
– Evaluate what the system is collecting and how frequently it is collecting • Enable /Disable metric to prevent unwanted metrics from being collected
– Use monitoring templates to turn it off
– Identify out of date monitoring templates , metric extensions to reduce inflow
• Reduce the collection frequency of non-critical metrics
• Weighted cost of metric collection analyzed and provided out of box
– Daily analysis for each target and each metric
– Outlier targets and metrics violating upper bound identified daily
– SDK views available for configuring BIP report
27
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Key indicators of increasing inflow of Event data
• Look out for consistent drastic increase in
– Metric alert backlog
– Metric Collection Errors
– Notification Delivery Backlog
Setup -> Manage Cloud Control >Repository >Metrics
28
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Fine tuning Incoming Event data
• Events are relatively expensive resource to load on repository
– Not expensive on agent but when loaded into repository, massive triggers to do lot of processing • Notifications processing
• Callback processing
• Reduce the metric alerts by setting appropriate warning and critical threshold
• Set the number of occurrences to rule out sporadic alerts
29
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Enterprise Manager Health Monitoring
•
•
Monitor-the-monitor (Self monitoring features) is the starting point of this investigation
Has Metric data or Event data inflow increased ?
30
Diagnostic Workflow
Is your system capacity sized appropriately ?
Follow the Sizing Recommendation for your
configuration –Small/Medium/Large
Analyze what has changed More targets ? More metrics ?
Has the inflow of load increased in Enterprise
Manager ?
Would you like to optimize performance ?
Yes
No/Not Sure
Yes
Yes
No
Identify the sub-system and introduce RAC services
Yes
Identify the target/metric and turn off the collection.
Set appropriate alert threshold.
Do you want the
changes ?
Identify the sub-system needing attention
Yes
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Tuning Critical Subsystems in Enterprise Manager
• Ping Service
• Repository Scheduler Jobs: Rollups
• Job Subsystem
• Events Subsystem
Optimizing Performance and Improving Scalability with RAC Service
31
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Tuning Critical Subsystems in Enterprise Manager
– Ping Service • Performance is crucial for determining Target Availability
– Not CPU heavy but processing is very frequent
– Every minute every agent pings OMS and updates row in EMD table
• Multiple fold increase in performance if Ping Service is pinned to single instance RAC node
– Single node service get rid of cluster wait events
• General Recommendation
– Anybody running RAC for Enterprise Manager repository should implement ping service and pin to single RAC node
– Configuration Steps in Appendix
Improving performance and scalability
32
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Tuning Critical Subsystems in Enterprise Manager
– Repository scheduler Jobs :Metric Rollups • Aggregation mechanism for historical trending and capacity planning
– Background process and the data is not required immediately
– I/O intensive operation
• If this job is running for more than 6 hours, general rule to add another thread if the database can handle the increased load from these threads.
• General Recommendation
– Configure rollup service only if there are
cluster wait events
– Check if single instance RAC node can handle large I/O volume.
– Pin rollup service to single instance RAC node
– Configuration Steps in Appendix
Improving performance and scalability
33
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Tuning Critical Subsystems in Enterprise Manager
– Job subsystem • Jobs are comparatively resource-intensive at the database
– If there lot of running jobs, expect to see application locks wait events , transaction locking wait events in AWR report in repository
• Job system uses the locks to maintain sequence
• Normal if it consumes 5-8% of the waiting time. Problem if over 20-30%.
• General Recommendation
– Improve performance by pining Job Service to 2Node RAC instances.
– Events subsystem • Events are comparatively resource-intensive at the database
• General Recommendation
– Improve performance by pining Events Service to single RAC node instances.
– Configuration Steps for Job Service and Event service in Appendix
Improving performance and scalability
34
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Appendix
35
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Configuring RAC services for Ping Service – Create database service and set affinity to only run on single RAC node
• Create database service “ping” and set one of RAC instance as primary instance in “-r”
– srvctl add service -d <dbname>-s ping -r <primary instance> -a <the the other instances> -y automatic
• Execute the following DBMS_SCHEDULER jobs
– As sys user, execute DBMS_SCHEDULER.create_job_class( job_class_name => ‘PING', service => ‘ping')
– GRANT EXECUTE ON sys.PING TO sysman;
– As sysman user, execute DBMS_SCHEDULER.SET_ATTRIBUTE ( name => 'EM_PING_MARK_NODE_STATUS', attribute => 'job_class', value => ‘PING')
– As sysman user, execute DBMS_SCHEDULER.SET_ATTRIBUTE ( name => 'EM_REPOS_SEV_EVAL', attribute => 'job_class', value => ‘PING')
– As sysman user, execute GC_SCHED_JOB_REGISTRAR.SET_JOB_CLASS('EM_REPOS_SEV_EVAL', 'PING')
– As sysman user, execute GC_SCHED_JOB_REGISTRAR.SET_JOB_CLASS('EM_PING_MARK_NODE_STATUS', 'PING')
• Set the connect string with ‘ping’ service name to emctl property "oracle.sysman.core.omsAgentComm.ping.connectionService.connectDescriptor” – Sample : emctl set property -name "oracle.sysman.core.omsAgentComm.ping.connectionService.connectDescriptor" -value
"\(DESCRIPTION=\(ADDRESS_LIST=\(ADDRESS=\(PROTOCOL=TCP\)\(HOST=xxx.us.oracle.com\)\(PORT=1521\)\)\)\(CONNECT_DATA=\(SERVICE_NAME=ping\)\)\)"
36
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Configuring RAC services for Rollups
– Create database service and set affinity to only run on single RAC node • Create database service “rollup” and set one of RAC instance as primary instance in “-r”
• srvctl add service -d <dbname>-s rollup -r <primary instance> -a <the the other instances> -y automatic
• srvctl start service -d <dbname>-s rollup srvctl status service -d <dbname>
• As sys user, execute DBMS_SCHEDULER.create_job_class( job_class_name => 'ROLLUP', service => 'rollup')
• GRANT EXECUTE ON sys.ROLLUP TO sysman;
• As sysman user, execute DBMS_SCHEDULER.SET_ATTRIBUTE ( name => 'EM_ROLLUP_SCHED_JOB', attribute => 'job_class', value => 'ROLLUP')
• As sysman user, execute GC_SCHED_JOB_REGISTRAR.SET_JOB_CLASS('EM_ROLLUP_SCHED_JOB', 'ROLLUP')
37
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Configuring RAC services for Jobs subsystem – Create database service and set affinity to run on 2 node RAC instance
• Create database service “emjob” and set two of the RAC instances as primary instance in “-r” – srvctl add service -d <dbname>-s emjob -r <primary instances> -a <the the other instances> -y automatic
• Execute the following DBMS_SCHEDULER jobs – As sys user, execute DBMS_SCHEDULER.create_job_class( job_class_name => ‘EMJOB', service => ‘emjob ')
– GRANT EXECUTE ON sys.EMJOB TO sysman;
– As sysman user, execute DBMS_SCHEDULER.SET_ATTRIBUTE ( name => ' EM_JOBS_STEP_SCHED ', attribute => 'job_class', value => ‘EMJOB')
– As sysman user, execute DBMS_SCHEDULER.SET_ATTRIBUTE ( name => ' EM_JOB_PURGE_POLICIES ', attribute => 'job_class', value => ‘EMJOB')
– As sysman user, execute GC_SCHED_JOB_REGISTRAR.SET_JOB_CLASS('EM_JOBS_STEP_SCHED', 'EMJOB')
– As sysman user, run GC_SCHED_JOB_REGISTRAR.SET_JOB_CLASS('EM_JOB_PURGE_POLICIES', 'EMJOB‘)
– INSERT INTO MGMT_PARAMETERS(parameter_name, parameter_value) VALUES ('EM_jobs_step_sched_job_class', 'EMJOB')
• Set the connect string with ‘ping’ service name to emctl property "oracle.sysman.core.omsAgentComm.ping.connectionService.connectDescriptor” – Sample : emctl set property -name "oracle.sysman.core.jobs.conn.service" -value
"\(DESCRIPTION=\(ADDRESS_LIST=\(ADDRESS=\(PROTOCOL=TCP\)\(HOST=xxx.us.oracle.com\)\(PORT=1521\)\)\)\(CONNECT_DATA=\(SERVICE_NAME=emjob\)\)\)"
38
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Configuring RAC services for Events subsystem – Create database service and set affinity to only run on single RAC node
• Create database service “event” and set one of RAC instance as primary instance in “-r”
– srvctl add service -d <dbname>-s event -r <primary instance> -a <the the other instances> -y automatic
• Set the connect string with ‘ping’ service name to emctl property "oracle.sysman.core.events.connectDescriptor” – Sample : emctl set property -name "oracle.sysman.core.events.connectDescriptor" -value
"\(DESCRIPTION=\(ADDRESS_LIST=\(ADDRESS=\(PROTOCOL=TCP\)\(HOST=xxx.us.oracle.com\)\(PORT=1521\)\)\)\(CONNECT_DATA=\(SERVICE_NAME=event\)\)\)"
39
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Repository-side Metric Extension to compute increase in target and agents count SELECT outer1.target_Guid,
CASE
WHEN outer1.TargetCount < 100 AND outer1.AgentCount < 10 THEN 'EVAL'
WHEN outer1.TargetCount < 1000 AND outer1.AgentCount < 100 THEN 'SMALL'
WHEN outer1.TargetCount < 10000 AND outer1.AgentCount < 1000 THEN 'MEDIUM'
ELSE 'LARGE'
END AS EvnAlert
FROM
(SELECT inner2.target_Guid, inner1.TargetCount, inner1.AgentCount From (SELECT Count(*) TargetCount, SUM(CASE WHEN target_type='oracle_emd' THEN 1 ELSE 0 END) AS AgentCount FROM mgmt$Target) inner1, (SELECT Target_Guid FROM mgmt$Target WHERE target_type='oracle_emrep') inner2 ) outer1
40
Monitoring Setting:
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
• Can configure multiple proxies for communication from OMS to Agents for High Availability
– Proxy Server is a target
– If Proxy Server is down, OMS uses the alternative proxy server
– Addition of proxies no longer require OMS bounce
• Can configure different proxy servers for agents in different networks
– Agents are associated to proxy servers by name or pattern during proxy server creation
Proxy servers: Configuring High Availability
OMS1 OMS2
CORPORATE NETWORK
AGENTS IN NETWORK1
AGENTS IN NETWORK2
B1
B2
B3
B4
C1 C2 C3
AGENTS NOT IN ANY NETWORK
Proxy Server
1
Proxy Server
2
Proxy Server
3
CORPORATE FIREWALL A1 A2
A3 A4
41
42
Highly Available Enterprise Manager 12c implementation using Intel Zeon Based Servers
Deepen Chakraborty
Enterprise Architect
Technology Manufacturing Group
Intel Corporation
Oct, 29th 2015
43
Agenda
Intel's’ DB Monitoring profile and challenges
Enterprise Manager 10g Grid Control Vs Enterprise Manager 12c Cloud Control
Centralized Enterprise Manager 12c Cloud Control Architecture
Critical subsystems monitoring results
Lessons Learned
Conclusion
44
Intel’s DB Monitoring Profile & Challenges
Intel’s Factory Automation Databases used for making critical Manufacturing decisions. (Operational and Planning, Engineering Analysis, Process control)
Automation Databases include both Mission Critical OLTP and Mission Important DSS type systems (Ranging from few hundred Gigs up to 30 TB) spread across US and Asia subcontinents
Applications clients include both vendor and homegrown apps (ODP .NET based)
The monitoring of databases was done using grid control around 14 implementations
It was a very hard to maintain 14 installations to have same setup for monitoring and alerting
Patching and monitoring implementations took longer time which is resource intensive
45
Site2 in Malaysia
Site1 in Malaysia
Site3 in Malaysia
Site1 in China
Site2 in China
Site3 in China
Site1 in Oregon
Site1 in Arizona Site2 in Arizona
Site1 in Vietnam
Site4 in Malaysia
Site2 in Oregon
Site1 in Israel
Oracle 10g Grid Control Enterprise Manager 12c Cloud Control
46
Enterprise Manager 12c Architecture Box Diagram
47
Centralized Enterprise Manager 12c Capabilities
Able to monitor or use all the features of 11gR2 and 12c databases using EM12c cloud control e.g. real time query performance, real time ADDM, reporting and metrics capabilities etc.
High Availability (Database and Oracle Management Server)
Fully Automated Target Setup using dynamic group feature
Robust Security (super user, admin user, read-only user)
Target control from target nodes instead of cloud control nodes
No central dependency of the target databases high availability(FSFO capability for targets) and backup recovery
48
EM12c Reference Architecture Stack Component Technology Description
OMR Oracle 11g R2 64 bit (11.2.0.4 P11 )
HA/DR of OMR Broker Managed Oracle Data Guard (Max Availability Mode) in a Fast Start Failover configuration
Oracle Restart (Part of Grid Infra Install)
Observer For OMR On Recovery Manager Node; Custom script (OBMAN) to automatically move the Observer away from the PRY data center upon role transition
Backup of OMR RMAN integration with Cloud Control ; Site Based RCAT for all OLTP Apps; Basic Compressed Backup sets Wkly Full and Daily Incrementals; Backups to run on PRY (with Block Change Tracking)
OMS Oracle Enterprise Manager Cloud Control 12.1.0.4
OMS HA Active/Passive Microsoft Failover Cluster for Windows 2008 R2 with HP Remote Copy
OMS Backup Using Script for configuration
OS 64 bit Windows 2008 R2 Server
Storage/File System •HP 3PAR SAN; OMR App specific Data/Index, Backup, Redo, Archive, Control ASM
•Oracle and Grid Infrastructure (ASM) Binaries NTFS
•OMS File system – NTFS
Network Teamed Private Network 1 GigE amongst the OMR Nodes ;
Distance between Data centers typically < 1 Km; Network Latency 2ms.
Public Network 10 GigE between the OMR and OMS Cluster Nodes
Server Config Windows 2008 R2 Service Pack1; (BL460c Gen 8 w/ 2 Sockets - 8 core Intel® Xeon® processors ; 192G RAM, Hyper Threading Enabled
49
OMS Performance Capture : 1
50
OMS Performance Capture : 2
51
OMS Performance Capture : 3
52
OMS Performance Capture : 4 Top 25 Metric Data loading
53
54
OMR Performance Capture
55
Lessons Learned Intel created Homegrown custom scripts/Utilities to:
– Auto Relocate the FSFO Observer (where the current PSB’s Data Center is) upon role transition
– Upload OMA dynamic properties so that Cloud Control reflects real state of PRY/PSB
–If central OMS/OMR goes down a home grown script for site database monitoring
Oracle Development Team helped in providing fix/patch to:
– Automatically transfer EM12c RMAN Jobs to the new primary upon Data Guard planned switchover and unplanned failover
– Helped to implement workaround for Recovery Catalog HA , enhancement will be available next EM Release
– Has provided custom OMS Graceful failover script on Microsoft Windows 2008 R2 Failover Cluster setup
56
Conclusion Oracle Enterprise Manager 12c provides seamless integration of
monitoring of Oracle Stack includes dataguard management with FSFO, Backup and Recovery
Oracle EM12c can monitor the databases geographically sparse locations
HA and security implementations of Enterprise Manager 12c stack provides seamless monitoring of databases
Thanks to TMG Engg Team for the contributions. Thank You for attending the session.
58
59
60
Application Database Blue Print Data Center 1
Public network
Private network
Storage Network
Primary
DB Instance
Data Center 2
Standby
DB Instance
Grid Infra
Grid Infra
OBSERVER/OB-MAN on OEM GC Node
OBSERVER/OB-MAN on OEM GC Node
Broker Enabled Data Guard/SYNC
Mirrored LUN of REDO/Control/Arch logs for Double Failure Coverage
ASM Data & FRA
ASM
Data & FRA
Storage Network
61
Observer and Recovery Catalog Database– In Each Site
Data Center 1
Public network
Private network
Storage Network
Primary
DB Instance
Data Center 2
Standby
DB Instance
Grid Infra
Grid Infra
OBSERVER/OB-MAN OBSERVER/OB-MAN
Broker Enabled Data Guard/SYNC
ASM Data & FRA
ASM
Data & FRA
Storage Network
Data Guard (Broker Enabled) Fast -Start Failover Zero Data Loss Configuration (SYNC/Max Availability Mode/Real Time Apply). For Different version of databases we will have multiple observers which will run from different ORACLE_HOME
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
62
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | 63