Exadata Database Machine: Maximum Availability Architecture … · 2020. 4. 23. · • Storage...
Transcript of Exadata Database Machine: Maximum Availability Architecture … · 2020. 4. 23. · • Storage...
Exadata Database Machine: Maximum Availability Architecture (MAA)Platinum Tier Focused
April 2020
Safe harbor statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.
Copyright © 2020 Oracle and/or its affiliates. 2
Oracle Maximum Availability Architecture(MAA) Solution Options
Copyright © 2020 Oracle and/or its affiliates. 3
Outage Matrix
SingleInstance Database
Primary Availability Domain Secondary Availability Domain
Local Backup
Replicated Backups
Dev, Test, Prod - Single Instance or Multitenant Database with Backups
• Single Instance with ClusterwareRestart
• Advanced backup/restore with RMAN
• Optional ZDLRA with incremental forever and near zero RPO
• Storage redundancy and validation with ASM
• Multitenant Database/Resource Management with PDB features
• Online Maintenance
• Some corruption protection
• Flashback technologies
BRONZE
Unplanned Outage RTO / RPO Service Level Objectives (f1)
Recoverable node or instance failure Minutes (f2)
Disasters: corruptions and site failures Hours to days. RPO since last backup or near zero with ZDLRA
Planned Maintenance
Software/hardware updates Minutes (f2)
Major database upgrade Minutes to hour
f1 : RPO=0 unless explicitly specifiedf2 : Exadata systems has RAC but Bronze Exadata configuration with Single Instance database running with Oracle Clusterware has highest consolidation density to reduce costs
Copyright © 2020 Oracle and/or its affiliates. 4
f1: RPO=0 unless explicitly specifiedf2: To achieve zero downtime or lowest impact, apply application checklist best practices
Prod/Departmental
SILVER
Bronze +• Real Application Clustering (RAC)• Application Continuity
Outage Matrix
RAC Database
Primary Availability Domain Secondary Availability Domain
Local Backup
Replicated Backups
Checklist found in MAA OTN https://www.oracle.com/technetwork/database/options/clustering/applicationcontinuity/adb-continuousavailability-5169724.pdf
Unplanned Outage RTO/RPO Service Level Objectives(f1)
Recoverable node or instance failure Single digit seconds (f2)
Disasters: corruptions and site failures Hours to days. RPO since last backup or near zero with ZDLRA
Planned Maintenance
Software/Hardware updates Zero (f2)
Major database upgrade Minutes to hour
Copyright © 2020 Oracle and/or its affiliates.
5
Application does not see errors during outagesTransparent Application Continuity (TAC)
• Uses Application Continuity and Oracle Real Application Clusters
• Transparently tracks and records session information in case there is a failure
• Built inside of the database, so it works without any application changes
• Rebuilds session state and replays in-flight transactions upon unplanned failure
• Planned maintenance can be handled by TAC to drain sessions from one or more nodes
• Adapts as applications change: protected for the future
Request
Errors/Timeouts hidden
Transparent Application Continuity
Copyright © 2020 Oracle and/or its affiliates. 6
Planned Maintenance
3
1
4
6
5
Planned Maintenance (without Outages!):1. Database Service is relocated or stopped 2. Service starts on another RAC instance3. Sessions connected to the service are drained4. New sessions connect to Service on another instance5. Results from Database Request returned to user6. Maintenance activities can start on first node (rolling)
2
RAC ClusterCopyright © 2020 Oracle and/or its affiliates. 7
Unplanned Outages, without Impact
Primary Active Data Guard Standby
1 2
3
Outage or Interruption at Database:1. Database Request interrupted by an Outage or timeout2. Session reconnects to the RAC Cluster (or Standby) and3. Database Request replays automatically4. Result from Database Request returned to user
4
2
3
Copyright © 2020 Oracle and/or its affiliates. 8
Checklist for Achieving Zero Application Downtime
1. Use Oracle Clusterware Service (never use default service)2. Use Recommended Connection String3. Configure FAN for Connection Pool4. Drain your service 5. Use Application Continuity or Transparent Application Continuity
1) MAA Whitepaper: Application Checklist for Continuous Service for MAA Solutions2) Using RHPhelper to Minimize Downtime During Planned Maintenance on Exadata
(MOS 2385790.1)3. Fleet Patch and Provisioning incorporates MAA practices
Copyright © 2020 Oracle and/or its affiliates. 9
f1: RPO=0 unless explicitly specifiedf2: To achieve zero downtime or lowest impact, apply application checklist best practices
Outage Matrix
Primary Region Secondary Region
Local backup
Remote StandbyPrimaryLocal
StandbyLocal
backup
AD2 AD1
Mission Critical
Silver +• Active Data Guard
• Comprehensive Data ProtectionMAA Architecture: • At least one standby required
across AD or region. • Primary in one data center(or AD)
replicated to a Standby in another data center
• Active Data Guard Fast-Start Failover (FSFO)
• Local backups on both primary and standby
GOLD DG FSFO
Unplanned Outage RTO/RPO Service Level Objectives (f1)
Recoverable node or instance failure Single digit seconds (f2)
Disasters: corruptions and site failures Seconds to 2 minutes. RPO zero or seconds
Planned Maintenance
Software/Hardware updates Zero (f2)
Major database upgrade Less than 30 seconds
Copyright © 2020 Oracle and/or its affiliates.
10
Active Data Guard Overview
PrimaryOpen Read-Write
Standby Open Read-Only
Zero Data Loss at any Distance
Automatic Block Repair
Offload read only or read mostly workloads to the
standby database
• Synchronous zero data loss replication
• Database rolling upgrade to reduce downtime for planned maintenance
• Automatic failover for High Availability
DML Redirection
Multi-instance Redo Apply for RAC
(In Memory supported)
Copyright © 2020 Oracle and/or its affiliates. 11
Primary DatabaseFar Sync Instance
Active Standby Database• Oracle control file and log files• No database files• No media recovery• Offload transport compression and/or
encryption
• Zero data loss failover target• Database open read-only• Continuous Oracle validation• Manual or automatic failover
SYNCLimited distance
ASYNCAny distance
Redo compressed over WAN
Zero Data Loss Protection at Any DistanceActive Data Guard Far Sync
• Production copy
Copyright © 2020 Oracle and/or its affiliates. 12
f1: RPO=0 unless explicitly specified f2: To achieve zero downtime or lowest impact, apply application checklist best practices f3: Application failover is custom or with Global Data Services
Gold +• GoldenGate Active/Active
Replication• Optional Sharding & Editions Based
Redefinition MAA Architecture: • Each GoldenGate “primary” replica
protected by Exadata, RAC and Active Data Guard
• Primary in one data center (or AD) replicated to another Primary in remote data center (or AD)
• Oracle GG & Editions Based Redefinition for zero downtime application upgrade
• Sharding for scalability and fault isolation
• Local backups on both sites• Achieve zero downtime through
custom failover to GG replica
Extreme Critical
PLATINUM Primary Region Secondary Region
Local backup
Local backup
AD2 AD1
GG Replication
AD1 AD2
Standby StandbyPrimary Primary
Outage MatrixUnplanned Outage RTO/RPO Service Level Objectives
(f1)
Recoverable node or instance failure Zero or single digit seconds (f2/f3)
Disasters including corruptions and site failures Zero (f3)
Planned Maintenance
Most common software/hardware updates Zero (f2)
Major database upgrade, application upgrade Zero (f3)
Copyright © 2020 Oracle and/or its affiliates.1313
Data Center Architecture & Requirements
• A minimum of 2 Regions for Disaster Recovery Failover• Region is a localized geographic area• West Coast NAS – Primary example• East Coast NAS – Secondary example
• Each Region should have a minimum of 2 Availability Domains (AD)
• Availability Domain Characteristics• AD’s are isolated from each other & fault
tolerant• AD’s do not share infrastructure such as power,
cooling or AD Network• A failure of one AD does not effect other AD’s.• AD’s within a Region are connected via high
speed network within same geographical area.
AD1 AD2
Primary Region – West NAS
AD1 AD2
Secondary Region – East NAS
High Speed with < 1ms LatencyCopyright © 2020 Oracle and/or its affiliates. 14
Platinum Reference Architecture
Active Data Guard Fast-Start Failover, Oracle GoldenGate Replication
Prod
A
STBY
A
Prod
B
AD1 AD2
AD2AD1
STBY
B
Primary Region 1 – West US
Secondary Region – East US
Sync Transport with Zero Data Loss
Sync Transport with Zero Data Loss
Read
Application Tier
Read/Write
Read
Application Tier
Read/Write
Copyright © 2020 Oracle and/or its affiliates.15
(Disaster Scenario: Loss of Entire Data Center)Reference Architecture – Zero App Downtime and Zero Data Loss
Prod
A
Prod
A
Prod
B
AD1 AD2
AD2AD1
STBY
B
Optional Client failover to ProdB
Automatic Data Guard Failover Achieve eventual Zero Data Lossby synchronizing replicas
Primary Region 1 – West US
Secondary Region – East US
Read/Write
Application Tier
Read
Application Tier
Read/Write
Zero App and DB Downtime With ProdB Replica
Read/Write
Copyright © 2020 Oracle and/or its affiliates.16Active Data Guard Fast-Start Failover, Oracle GoldenGate Replication
ProdA returns to Primary and STBYA+ to Standby
Primary Region 1 – West US
Reference Architecture – Switching Back
Prod
A
STBY
A
Prod
B
AD1 AD2
AD2AD1
STBY
B
1. Reinstate Failed System2. Data Guard Switchover to return to original state
Secondary Region – East US
Read
Application Tier
Read/Write
Read
Application Tier
Read/Write
Copyright © 2020 Oracle and/or its affiliates.17Active Data Guard Fast-Start Failover, Oracle GoldenGate Replication
Reference Architecture – Upgrade Scenario
Prod
A
STBY
A
Prod
B
AD1 AD2
AD2AD1
STBY
B
V1
V1
V1
V1
Primary Region 1 – West US
Secondary Region – East US
Read
Application Tier
Read/Write
Sync Transport with Zero Data Loss
Sync Transport with Zero Data Loss
Read
Application Tier
Read/Write
Copyright © 2020 Oracle and/or its affiliates.18Active Data Guard Fast-Start Failover, Oracle GoldenGate Replication
Upgrade Scenario Step 1: Upgrade Prod B and Standby
Copyright © 2020 Oracle and/or its affiliates.
Prod
A
STBY
A
Prod
B
AD1 AD2
AD2AD1
LST
BYB
V1
V2
V1
V2
Primary Region 1 – West US
Secondary Region – East US
Read
Application Tier
Read/Write
Sync Transport with Zero Data Loss
Async Transport during Upgrade
Application Tier
Read/Write
Optionally redirect to Region 1 if application allows
3. Restart Standby on V2 OH
4. Upgrade with redo apply
1. Upgrade Prod B
2. Validate
19Active Data Guard Fast-Start Failover, Oracle GoldenGate Replication
Upgrade Scenario Step 2: Synchronize GG ReplicasPrimary Region 1 – West US
Copyright © 2020 Oracle and/or its affiliates.
Prod
A
STBY
A
Prod
B
AD1 AD2
AD2AD1
LST
BYB
V1
V2
V1
V2
Secondary Region – East US
Read
Application Tier
Read/Write
Sync Transport with Zero Data Loss
Sync Transport with Zero Data Loss
Application Tier
Read/Write
Optionally redirect to Region 1 if application allows
Read/Write
GG Catch Up
20Active Data Guard Fast-Start Failover, Oracle GoldenGate Replication
Upgrade Scenario Step 3: Co-Exist with V1 and V2
Prod
A
STBY
A
Prod
B
AD1 AD2
AD2AD1
STBY
B
V1
V2
V1
V2
Primary Region 1 – West US
Secondary Region – East US
Read
Application Tier
Read/Write
Sync Transport with Zero Data Loss
Sync Transport with Zero Data Loss
Application TierRead
Read/Write
Copyright © 2020 Oracle and/or its affiliates.21Active Data Guard Fast-Start Failover, Oracle GoldenGate Replication
Platinum Advantages for Upgrade
Benefits1. Zero Downtime and Zero Data Loss2. Evaluate V1 and V2 at the same time3. GoldenGate replication between V1 and V2 provides simple switchover
and fallbackOnce V2 has been validated and deemed acceptable, then:
• Repeat process and upgrade both V1 primary and standby at the same time
Copyright © 2020 Oracle and/or its affiliates.
Final Decision Point
22
Upgrade Scenario Step 4: Upgrade Prod A and Standby B to V2
Copyright © 2020 Oracle and/or its affiliates.
Prod
A
STBY
A
Prod
B
AD1 AD2
AD2AD1
STBY
B
V2
V2
V2
V2
Primary Region 1 – West US
Secondary Region – East US
Read
Application Tier
Read/Write
Async Transport During Upgrade
Sync Transport with Zero Data Loss
Application TierReadRead/Write
1. Upgrade Prod A
2. Validate
3. Restart Standby on V2 OH
4. Upgrade with redo apply
23Active Data Guard Fast-Start Failover, Oracle GoldenGate Replication
Upgrade Scenario Steps 5/6: Synchronize and Back to Normal
Copyright © 2020 Oracle and/or its affiliates.
Prod
A
STBY
A
Prod
B
AD1 AD2
AD2AD1
STBY
B
V2
V2
V2
V2
Primary Region 1 – West US
Secondary Region – East US
ReadApplication Tier
Sync Transport During Upgrade
Sync Transport with Zero Data Loss
Application Tier
Read/Write
Read/Write
Read
Read
5. Synchronize GG
24Active Data Guard Fast-Start Failover, Oracle GoldenGate Replication
Unplanned Outages for Platinum MAA with Exadata
Unplanned Outages
Database Downtime (RTO)
Application Impact
Data Loss (RPO) Key Enablers
Exadata Cluster Network Fabric or Storage Failures
Zero Zero or Near Zero Zero ExadataASM Disk Groups in High Redundancy
RAC Instance or Node Failures
Zero Single Digit Seconds
Zero Exadata, RACApplication Continuity with MAA Checklist
Data Corruptions Zero Zero or Isolated Failure
Zero or Isolated Logical Impact
Active Data GuardMOS 1302539.1Flashback TechnologiesZDLRA
Disasters including database, cluster or site failures
Zero since GG replica is available
Zero or Near ZeroSingle Digit Seconds with GDS
Eventual Zero Oracle GoldenGateData Guard Fast-Start FailoverCustom App Failover or Global Data Services or Site Guard
Copyright © 2020 Oracle and/or its affiliates. 25
Planned Maintenance for Platinum MAA with Exadata
Planned Maintenance
Database Downtime (RTO)
Application Impact Key Enablers
Exadata Infrastructure SW or HW Updates
Zero Zero or Near Zero Exadata PlatformASM Disk Groups in High Redundancy
Database and Grid Infrastructure Software Updates
Zero Zero RACApplication ContinuityContinuous Availability - Application Checklist for Continuous Service for MAA Solutions
Database Upgrades or non-Rolling Updates
Zero Zero or Near Zero GoldenGateCustom Application failover or Global Data Services
Copyright © 2020 Oracle and/or its affiliates. 26
Our mission is to help peoplesee data in new ways, discover insights,unlock endless possibilities.