Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases (...

52

Transcript of Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases (...

Page 1: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local
Page 2: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Angeline Janet Dhanarani Principal Product Manager Oracle Enterprise Manager September 22, 2016 Joe Kopilash Architect and Director Database Administration EPSILON

Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |

Tips for Maximizing Reliability and Scalability of Oracle Enterprise Manager :CON6988

Page 3: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Complete Cloud Control

Optimized, Efficient | |

Integrated Cloud & On-premise Stack Management

Agile, Automated

Complete Cloud & On-premise Lifecycle Management

Scalable, Secure

Superior Enterprise-Grade Management

Page 4: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

4

Page 5: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Program Agenda

Best practices in configuring highly available and scalable deployment

Top 10 strategy tips for achieving “Economies of Scale”

1

2

5

Page 6: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Best practices in configuring highly available and scalable deployment

6

Page 7: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Firewall

>EMCLI

>EMCLI

Software Library

Oracle Management Repository(OMR)

HTTP(S)

JDBC

Enterprise Manager Architecture Overview

Agent Plug-ins Targets

EMCLI

Console

Connectors

Oracle Store/MOS Notifications

Oracle Management Service(OMS)

Always-on Monitoring

Page 8: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

High Availability Consideration Metrics

• Three prime components determining the choice of high availability option • Recovery Time Objective

• Recovery Point Objective

• Availability Percentage

8

High Availability Options Available

Configuration

Level1 Single OMS SI Repository DB Instance

Level2 Active /Passive OMS DG(local)

Level3 Active/Active OMS RAC + DG (local)

Level4 Global Load Balanced Multi-site(DR)

Recovery Point Objective

Recovery Time Objective

0

0%

100% Availability

Page 9: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Recovery Point Objective And Recovery Time Objective

9

Wks Days Hrs Mins Secs Secs Mins Hrs Days Wks

Recovery Point Recovery Time

Critical data is recovered

Systems recovered and operational Disaster Strikes

Recovery Point Objective(RPO) measures the ability to recover files by specifying a point in time restore of the backup copy. The metric is an indication of the amount of data at risk of being lost. How current or fresh is the data after recovery?

Recovery Time Objective(RTO)measures the time that it takes for a system to be completely up and running in the event of a disaster. The metric is an indication of the downtime How quickly can systems and data be recovered?

Page 10: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Recovery Point Objective And Recovery Time Objective

10

Wks Days Hrs Mins Secs Secs Mins Hrs Days Wks

Recovery Point Recovery Time

Critical data is recovered

Systems recovered and operational Disaster Strikes

Level 4

Increasing Cost $$$ Increasing Cost $$$

Level 1 Level 1

Level 2 Level 2

Level 3 Level 3

Level 4

Page 11: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Availability (Number of 9’s)

• Availability: expressed as a percentage of uptime in a given year

11

Availability% Downtime per year

Downtime per month

Downtime per week

Downtime per day

90% ("one nine") 36.5 days 72 hours 16.8 hours 2.4 hours

99% ("two nines") 3.65 days 7.20 hours 1.68 hours 14.4 minutes

99.9% ("three nines") 8.76 hours 43.8 minutes 10.1 minutes 1.44 minutes

99.99% ("four nines") 52.56 minutes 4.38 minutes 1.01 minutes 8.66 seconds

99.999% ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds 864.3 milliseconds

Availability =Uptime/(Uptime +Downtime)

Page 12: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

• Availability percentage factors:

– Cost of availability:

• Organizations’ infrastructure cost

– Data Center(s)

– Hardware / network

• Build and operational cost

– Initial cost

– Maintenance cost

– Loss of downtime:

• Revenue loss

• Financial performance loss

• Productivity loss

• Damaged reputation loss

Determining 9’s

12

How many 9’s is applicable for you ?

Page 13: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Choice Of High Availability Levels

Level1

• SI Repository DB Instance

• Single OMS

Level2

• Active /Passive OMS

• DG/RAC(local)

Level3

• Active/Active OMS

• RAC + DG (local)

Level4

• Global Load Balanced

• Multi-site(DR)

13

Weeks

Days

Hrs

Mins

Secs

RTO

RPO

Page 14: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Protection Offered With High Availability Levels

High Availability Levels

Configuration OMS Host Failure

OMS Storage Failure

Database Host Failure

Database Storage Failure

Site Failure

Level1

SI Repository DB Instance Single OMS

Level2 Active /Passive OMS DG(local)

Level3 Active/Active OMS RAC + DG (local)

Level4 Global Load Balanced Multi-site(DR)

14

Page 15: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

1) Configure backups of OMS configurations –emctl exportconfig

2) Configure backups of Emkey, Software Library and AOM.

3) Configure RMAN backups of OMR and AOM repository

4) Enable Flashback database 5) Maintain consistent time-stamped backups at all

layers.

High Availability Level 1

• Key features :

– Single Instance databases ( OMR,AOM)

– Single OMS and single AOM

– Agents talking directly to OMS and AOM

– Local file system for the Installs and Software Library

• Benefits:

– Out of box: minimal configuration

– Low implementation cost

– Recommended only for development/test

• Drawbacks:

– Limited scalability( Agent Count <100,

Target Count <1000) 15

>EMCLI

OMR

OMS

Users

AOM Repository

AOM

Agent

Page 16: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

High Availability Level 2

• Key features :

– Local Data Guard standby (OMR,AOM)

– Two OMS’s(Active /Passive),Single AOM

– Shared storage with VIP based failover

• OMS Install, Software Library, BI Publisher

• Benefits:

– Offers protection against OMS node failure

– Automatic failure detection and automated failover of OMS node

• Drawbacks:

– Limited scalability( Agent Count <100,

Target Count <1000)

16

1) Configure backups of OMS configurations –emctl exportconfig

2) Configure backups of Emkey ,Software Library and AOM

3) Configure Highly available shared storage 4) Configure Local Data Guard for data replication

Active OMS Passive OMS

NFS

OMR OMR(Local Standby) AOM Repository

AOM

Agent

AOM Standby

Page 17: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

High Availability Level 3

• Key features :

– OMR and AOM repository on RAC database + Local Data Guard standby

– Two (or more) active OMS’s, AOMs

– Agents and console/EMCLI users talk to SLB

– Shared file system for the Software Library and BI Publisher

– SLB( Load balancer) for network failover

• Benefits:

– Best level of scalability and availability a single site solution can provide

– Protection against localized failures

• Drawbacks: No site level failure protection

17

1) Configure backups of OMS configurations –emctl exportconfig

2) Configure backups of Emkey ,Software Library and AOM

3) Configure RAC and Local Data Guard standby 4) Secure agents to load balancer’s URL

Agent

AOM Standby

>EMCLI

Users SLB(Load Balancer)

OMR with RAC

Active OMS

OMR (Local Standby)

Active AOMs

AOM with RAC

NFS

Page 18: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

High Availability level 4 -Disaster Recovery solution Storage replication based disaster recovery configuration

OMS Shares

Swlib Share

BIP Share

AOM Shares

DNS Lookup

Server Load Balancer of Primary data center

OMR Repository

Server Load Balancer of Standby data center

OMS Shares

Swlib Share

BIP Share

AOM Shares

DB Replication with Dataguard from Primary to Standby

ACTIVE PASSIVE

Storage Continuous Replication

Storage Storage

Primary OMS,

BIP,JVMD

Additional OMS,

BIP,JVMD

Physical Standby Repositories

AOM 1 AOM2

AOM Repository

Page 19: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

High Availability Level 4 -Disaster Recovery Solution

• Key features :

– Maximum enterprise protection scheme for Enterprise Manager availability

– Fully replicated storage for OMSs,AOM between sites

• Shared storage for the Software Library and BI Publisher

– Local and Global load-balancers

• Full protection for loss of single site with failover to standby site at network level

– Full protection against local failures and geographical/site disasters

– Only full “DR” ready implementation template for Enterprise Manager availability

• Benefits:

– Minimized downtime and best level of scalability

– Recommended for 24* 7 production instance

• Drawbacks: Higher Cost for Setup

19

1) Configure backups of OMS configurations –emctl exportconfig

2) Configure backups of Emkey ,Software Library and AOM

3) Configure continuous storage replication 4) Secure agents to Global load balancer’s URL

Best Practice

Page 20: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

High Availability Level 4 - Disaster Recovery Solution

Abstraction of physical OMS hostnames:

Alias hostname for OMS hosts on active - passive sites

Alias hostname not required for AOM hosts

Network abstraction using Virtual HostName:

Secure agents against virtual SLB URL

Dedicated replicated storage volumes:

OMSs /central agent binaries with their inventories

AOM binaries with their inventories

Shared replicated storage volumes: SWLIB , BIP

Storage replication technology should support:

Snapshots and consistent filesystem copies

Perform scheduled and on-demand replication

Recommended design pattern for reliable and scalable deployment

20

OMS2 (Active)

Physical Host2 Alias emoms2

OMS1 (Active)

Physical Host1 Alias emoms1

OMS Volumes

AOM2

(Active) Physical Host4

AOM1

(Active) Physical Host3

AOM Volumes

Shared Software Lib

(SWLIB) Volume

Shared BI Publisher

(BIP) Volume

ACTIVE

CONTINUOUS STORAGE REPLICATION

Page 21: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

High Availability Level 4 - Disaster Recovery Solution

• For customers

– Who have implemented Level4 in EM 12c using Standby Weblogic Domain

– Who need to continue to implement Level4 in EM 13c using storage replication

– From whom one or more of the following conditions apply: • OMSs/central agents not configured with alias hostnames

• Inventory for OMS and/or central agent not on replicated storage

• OMSs/central agents not on replicated storage

• EM 13.2 Installer has new mode Upgrade and Transition to DR Readiness • Enabled by passing new parameter UPGRADE_TRANSITION

• Reference : Upgrade Guide and OTN whitepaper for configuration details.

Migration from standby weblogic domain to storage replication topology

Page 22: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Top 10 strategic tips for achieving “Economies of Scale”

22

Page 23: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip1: Ensure Compliance With Sizing Guidelines

• Adhere to Sizing recommendations for optimal performance

• Pro-active measure to check increase in count of targets and agents • Repository-side Metric Extension to alert if number of targets and agents increases above the

recommended sizing guidelines for the installed configuration

– Set alert threshold to the installed configuration – EVAL / SMALL / MEDIUM / LARGE

– Refer Appendix for configuration

23

Size Agent Count Target Count Concurrent User Sessions

EVAL <10 <100 <3

SMALL <100 <1000 <10

MEDIUM >= 100, < 1000 >= 1000, < 10,000 >= 10, < 25

LARGE >= 1000 >= 10,000 >= 25, <= 50*

1) Ensure Hardware requirements are optimal for the Size ( Memory + Storage)

2) Ensure Java Heap Size settings are optimal for OMS

3) Ensure database parameters settings are compliant for your configuration

Page 24: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip2: Tune Enterprise Manager Repository

• Check out Top Wait Events metrics for Enterprise Manager repository

– Log File Sync wait event should not appear in the top 5 waits • Correlate with AWR report findings to see if the impact is from

specific sub-system.

• Could also indicate increase in load inflow.

– DB CPU should be topmost wait event

– Expect to see I/O related wait events but only Write • Check the SGA Buffer Cache Size if there read wait events in top 5

– For an Ideal repository host :single instance or RAC cluster – No more than 40% CPU utilization on average across all nodes

in RAC

Ideal Values for Log File Sync Wait Time Engineered System ~1 millisec Non-engineered System ~5 millisec or less

Page 25: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip3: Tune Critical OMS Subsystems For Performance

Critical Subsystem

Best Practise Recommendation

Ping Subsystem

Anybody running RAC for Enterprise Manager repository should implement Ping service and pin to single RAC node

Repository scheduler Jobs : Metric Rollups Subsystem

Configure Rollup service only if there are cluster wait events. Pin to single RAC node instance if it can handle large I/O volume.

Job Subsystem

Improve performance by configuring Job Service to 2Node RAC instances.

Events Subsystem

Improve performance by configuring Events Service to single RAC node instance.

Optimize performance and improve scalability with RAC service

25

•Refer to Appendix for information on using self-monitoring feature to gauge performance •Refer to Sizing guidelines for configuring RAC services for each of these subsystems

Page 26: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip4: Fine Tune Incoming Metric Data Key indicators of increasing inflow of metric data

• Uncontrolled metric data generation causes significant performance degradation of Enterprise Manager

Page 27: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip4: Fine Tune Incoming Metric Data

• Best practice recommendation : Collect only what you need

– Evaluate what the system is collecting and how frequently it is collecting • Enable /Disable metric to prevent unwanted metrics from being collected

– Use monitoring templates to turn it off

– Identify out of date monitoring templates , metric extensions to reduce inflow

• Reduce the collection frequency of non-critical metrics

• Set thresholds only on metrics you care about

• Reduce the metric alerts by setting appropriate warning and critical threshold

• Adjust metric thresholds based on metric trend

• Set number of occurrences to rule out sporadic alerts

• Use corrective actions to auto-clear metrics alerts

27

Page 28: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip5: Smarter Monitoring Configuration

• Strategy: As the enterprise grows, minimize efforts to monitor new targets

– Organize targets into an administration group hierarchy

• Streamline for monitoring management

• Create dynamic groups to meet non-monitoring requirement (If group membership different)

– Define monitoring standards and associate Template Collections

28

Associate Associate

Production Template Collection

Non-Production Template Collection

Monitoring

Management Settings

-> Standard metrics

and thresholds

->Metric Extensions

->Configuration

Extensions

PROD Non-PROD

ALL TARGETS

FINANCE SALES FINANCE SALES

Page 29: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip6: Leverage Automated Privilege –Propagation

• Grant privileges on administration group to role – Enables privileges to be propagated to members of a group

• Automatically propagates to any new members

• Enhances target group management

– Allows granting of privileges to thousands of targets

– Allow grants to various teams different levels of access to target group

– On-boarding new administrator becomes simpler

• Follow Principle of Least Privilege when granting target privileges

29

PROD Non-PROD

ALL TARGETS

FINANCE SALES FINANCE SALES

Production-Finance group

“Database Application Developer” on group

“Database

Application DBA”

on group

Database Application Developer Role

Database Application DBA Role

Page 30: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip7: Prioritize Incoming Data Loading Requests

• Prioritize target by setting 'Lifecycle Status' property

– The 'Lifecycle Status' property of the target influences the behavior for loading data(incoming), notification processing (outgoing) and job dispatching (outgoing)

– The OMS and repository know the lifecycle status of each target, and uses that with every request to/from this target to prioritize the administration requests • Mission Critical and Production targets are given

the highest priority so that alerts always get loaded.

• Targets of Development or Test status may be asked to "back off" until the load returns to a manageable level.

– You can not add to these values, but you can modify the display name with EM CLI emcli modify_lifecycle_stage_name

30

Possible Values ------------------------------ 1 – Mission Critical (Highest) 2 – Production 3 – Stage (Or blank/no value) 4 – Test / QA 5 – Development (Lowest)

Refer to Appendix for information on using self-monitoring feature to gauge Incoming Data Loading capacity

Page 31: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip8 :Ensure Healthy Agent Subsystem

• Agents : primary backbone components

– collect data from the targets and upload to the OMS and

– execute tasks on behalf of the Enterprise manager users

• Improved diagnosability of agent unreachability with sub-status

31

Up Unmonitored Blocked Manually

Under Migration Post Blackout

Cannot Write to File System Blocked (Plug-in Mismatch)

Collections Disabled Blocked (Bounce Counter Mismatch)

Disk Full Agent Misconfigured

Communication Broken Status Pending (Post Blackout)

Status Pending Status Pending (Post Metric Error)

Target Addition in Progress Agent Unreachable

Refer to Appendix for information on using self-monitoring feature to gauge health of agents and troubleshooting them

Page 32: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip9: Leverage Always-on Monitoring

• Highly available and scalable service that provides core monitoring capabilities

– Can be ‘always on’ to ensure continuous monitoring of mission critical environments

– Safeguards against impact of planned Enterprise Manager downtime

– Available in EM13c as separate java application (Self-Update)

– Receives target availability alerts and metric alerts from agents

– Sends email notifications for target down, critical metric alerts

Monitoring continuity during planned downtime

32

OMR

OMS

AOM Repository

AOM

Agent

Email

Page 33: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip10: Leverage Notification Blackouts

– Enables proactive monitoring of status and health of targets during maintenance

– Target is considered as being actively monitored

– Notifications / Incident creation is stopped

– Target monitoring (events) visibility provided in console

– Two options: • “Under maintenance” :Target downtime excluded from availability(%) calculations

• “Non-maintenance” : Any target downtime impacts availability(%) calculations

Monitoring visibility during maintenance periods

33

Page 34: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Summary : Tips for Maximizing Reliability and Scalability of Oracle Enterprise Manager

Choice of high availability deployment of Enterprise Manager determining reliability and scalability

Factors determining the operational efficiency of Enterprise Manager administration.

Management best practices for achieving Economies of Scale

Page 35: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Appendix

Page 36: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Repository-side Metric Extension

SELECT outer1.target_Guid,

CASE

WHEN outer1.TargetCount < 100 AND outer1.AgentCount < 10 THEN 'EVAL'

WHEN outer1.TargetCount < 1000 AND outer1.AgentCount < 100 THEN 'SMALL'

WHEN outer1.TargetCount < 10000 AND outer1.AgentCount < 1000 THEN 'MEDIUM'

ELSE 'LARGE'

END AS EvnAlert

FROM

(SELECT inner2.target_Guid, inner1.TargetCount, inner1.AgentCount From (SELECT Count(*) TargetCount, SUM(CASE WHEN target_type='oracle_emd' THEN 1 ELSE 0 END) AS AgentCount FROM mgmt$Target) inner1, (SELECT Target_Guid FROM mgmt$Target WHERE target_type='oracle_emrep') inner2 ) outer1

36

To compute increase in target and agents count

Monitoring Setting:

Page 37: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip3: Tune Critical OMS Subsystems For Performance

• Monitor EM Jobs Service (Performance Charts)

– Rollup of the repository operations (step scheduling) and all OMS operations (job dispatching) information

• Reports: 'Job System Diagnostic Report'

Self-monitoring Jobs subsystem:

Page 38: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip3: Tune Critical OMS Subsystems For Performance

• Monitor throughput of Metric Rollup Susbystem

– Setup -> Manage Cloud Control -> Repository

– Tune with RAC Service and by configuring additional rollup worker threads using configure option

Self-monitoring metric rollup subsystem:

Click

Page 39: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip3: Tune Critical OMS Subsystems For Performance

• Event processing statistics captured per OMS

• Key indicators of event subsystem – Metric alert backlog

– Metric collection errors

– Notification delivery backlog

Self-monitoring events subsystem:

39

Setup -> Manage Cloud Control >Repository >Metrics

Page 40: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip7: Prioritize Incoming Data Loading Requests

• Monitor using OMS Loader capacity (Processing Rate)

– Aggregate operational performance for all OMS's

• Monitor using Agent backlog/Agent Back-offs rate

– Too much information generated by all the Agents (metric data, alerts / state changes, metadata)

Self-monitoring incoming data loading capacity

40

Setup -> Manage Cloud Control >Health Overview

Page 41: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Tip8 :Ensure Healthy Agent Subsystem

• Monitor agents from Setup > Manage Cloud Control >Agents

• Monitor Target Status Diagnostics Report: agent-based targets (Information Publisher report)

Check the Promote Status column and Broken Reason in Target Information

Check for latest “Clean Heartbeat UTC” time in “Agent Ping Status” table in the Report

Ensure OMS is reachable from agent host and agent from OMS host

Check “emctl status” for various configurations. Eg: Agent communicating with correct OMS

Check agent upload with “emctl upload”

Self monitoring agent subsystem

Page 42: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved.

Safe Harbor Statement

The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

42

Page 43: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. 43

Page 44: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

©2014 E

psilo

n D

ata

Managem

en

t, LLC

. Priv

ate

& C

onfid

entia

l 44

Infrastructure Deployment

Tuning and Monitoring

Optimization

Oracle Open World

2016

Tips for Maximizing

Reliability and

Scalability of Oracle

Enterprise

Manager[CON6988

Page 45: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

©2014 E

psilo

n D

ata

Managem

en

t, LLC

. Priv

ate

& C

onfid

entia

l 45

Enterprise Manager High Availability and Disaster Recovery Topology

Page 46: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

©2014 E

psilo

n D

ata

Managem

en

t, LLC

. Priv

ate

& C

onfid

entia

l 46

Disaster Recovery OMS Setup

DR is using storage replication methodology for EM with alias host names back in 12.1.0.2

Our own custom setup role based service on primary instance(s) to control OMS start/stop

srvctl add service -d REPOS -s OMS_PRIMARY -r “REPOS1,REPOS2" -P BASIC -l PRIMARY

• We deployed a fan script to automatically start/stop OMS on the primary nodes based on the state of the primary service above.

• We commented out default OMS restart script since we don’t want DR to automatically start upon a server reboot

• Only need to run data guard switchover/failover command to perform all operations

Enterprise Manager High Availability and Disaster Recovery Topology

Page 47: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

©2014 E

psilo

n D

ata

Managem

en

t, LLC

. Priv

ate

& C

onfid

entia

l 47

Groups With Propagating Privilege – We use them for many

things

• Are used for privilege delegations

• Are Nested

• Are used for reporting

• Are used for monitoring

Monitoring

Configurations

Page 48: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

©2014 E

psilo

n D

ata

Managem

en

t, LLC

. Priv

ate

& C

onfid

entia

l 48

Monitor Enterprise Manager Performance

• Check built-in health monitors – Health Overview

• Check Upload Backlog

• Set appropriate incident rules to monitor Enterprise Manager

system health

• Possibly increase reverse agent ping to account for network blips

Monitor The

Monitor

Built-in health

monitors

Monitor what is going on in the system in terms of

events raised

• Check frequency of metrics alerting – Any

metrics alerting excessive?

• Ability to make reports to review the system

• Query EM views such as

• MGMT$EVENTS

• MGMT$INCIDENTS

Page 49: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

©2014 E

psilo

n D

ata

Managem

en

t, LLC

. Priv

ate

& C

onfid

entia

l 49

Custom Report On Metrics Triggering Alerts

Monitor The

Monitor

Page 50: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

©2014 E

psilo

n D

ata

Managem

en

t, LLC

. Priv

ate

& C

onfid

entia

l 50

Example of metrics causing too much noise

• Use Incident Manager to view open events/incidents/problems

• Query MGMT$METRIC_CURRENT,

MGMT$METRIC_HOURLY, MGMT$METRIC_DAILY, and/or

MGMT$INCIDENTS

• If you find you have an issue you can modify the default and

push out a template to all targets of just the metric you want to

change

• You can do it programmatically using EMCLI if desired.

Monitor The

Monitor – Check

for metrics

causing too much

noise

Page 51: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local

©2014 E

psilo

n D

ata

Managem

en

t, LLC

. Priv

ate

& C

onfid

entia

l 51

Thanks!

Page 52: Angeline Janet Dhanarani - Oracle Cloud · •Key features : – Single Instance databases ( OMR,AOM) –Single OMS and single AOM –Agents talking directly to OMS and AOM –Local