High Availability and Disaster Recovery Topologies - OMF Canberra June 2014
description
Transcript of High Availability and Disaster Recovery Topologies - OMF Canberra June 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
High Availability and Disaster Recovery Topologies
Damien McAullayOracle Fusion MiddlewareJune 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 2
Business Continuity Planning (BCP)
• How to make your business “life” go on in the case of a disaster– It’s about the business, not the means– Does not necessarily incorporate IT• (but usually does in the 21st century)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 3
Disaster recovery
• The “IT” part of BCP
• How do I recover my data, configurations, …?• Where do I restore my data to?• How can my users get access to the
recovered environments?
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 4
Some data recovery strategies
Take periodic backups to local media, store media offsite
Replicate data to another site
Backup directly to offsite Replicate data to the cloud
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 5
High availabilityDowntime per year
90% (one nine) 36.5 days
99% (two) 3.65 days
99.9% 8.76 hours
99.99% 52.56 minutes
99.999% 5.26 minutes
99.9999% 31.5 seconds
99.99999% 3.15 seconds
• Determined by “up time”– (total time – down time) / total time
• Target availability often expressed in class of “nines”
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 6
High availability
• Often practical to pre-define downtime (e.g. maintenance windows, periods where users are not active like public holidays)
• Three key aspects:– No single points of failure– Reliable switching mechanism(s)– Capability to detect failure, recover/bypass, and alert technician
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 7
WebLogic
Example scenario: WebLogic web-based application, database, and internal/external users
WLS
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 8
Eliminate SPOF
Add 2nd WebLogic server to cluster
Add a 2nd WebLogic node to cluster
Add 2nd Database server
Add load balancers to distribute load across WLS and DB
WLS WLSWLS WLS
LB
LB
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 9
Reliable switching
Use a network load-balancer (e.g. F5 or CSM) to distribute requests across WLS
Use Active GridLink data source in WLS to connect to RAC Database (clustered)
WLS WLSWLS WLS
RAC
F5
ActiveGridlink
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 10
Monitoring, recovery, and alertingUse OEM to monitor WLS/DB, push metrics into service desk platform
Use OEM/scripts to remediate common/known problems (e.g. restart WLS on OOM)
Add notifications for outages, performance degradation, etc. to technicians
WLS WLSWLS WLS
RAC
F5
Ora
cle
Ente
rpris
e M
anag
er
Your
Ser
vice
Des
k Pl
atfor
m
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 11
Fast disaster recovery
WLS WLSWLS WLS
RAC
DC1
WLS WLSWLS WLS
RAC
DC2
WLS WLSWLS WLS
RAC
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 12
Maintenance• Server patching
1.0 1.1
• Deploying app updates
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 13
Closing advice …• Change Control– Make sure you can identify the current “state” of your deployments
• Practice makes perfect– The more often you rebuild your environments, the better you’ll perform on race day– Private-cloud-style provisioning and CI encourage practices useful for DR
• Don’t reinvent the wheel– Use the Oracle Maximum Availability Architectures as a starting point
• Start with your BCP– HA/DR is not cheap, so don’t do anything unnecessary
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 14