Adventures in Dataguard
-
Upload
jason-arneil -
Category
Technology
-
view
5.040 -
download
2
Transcript of Adventures in Dataguard
![Page 1: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/1.jpg)
Adventures in DataguardDr. Jason Arneil
![Page 2: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/2.jpg)
Motivation
Why Dataguard
![Page 3: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/3.jpg)
• Introduction
• The Motivation
• Dataguard Architecture & Features
• Creating a Physical Standby
• Maintaining your standby
• Using your Standby
• Performing a Switchover
AGENDA
![Page 4: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/4.jpg)
Introduction
Health Warning
![Page 5: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/5.jpg)
Introduction
• Jason Arneil
• System Administrator/DBA
• Using Oracle since 1998
• At Nominet since 2001
About Me
![Page 6: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/6.jpg)
Introduction
• Nominet is the internet registry for .uk domain names
• Nominet has been in existence for over 11 years
• Nominet is run as a not-for-profit company
• Nominet is owned by its members
• There are over 6 Million .uk domain names
About Nominet
![Page 7: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/7.jpg)
Motivation
Why Dataguard
• Big push on a Nominet Business Continuity Plan
• Dataguard is the Oracle solution for disaster recovery
• Physical Standby was the obvious option
• Maximum Availability Architecture (MAA)
![Page 8: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/8.jpg)
Motivation
Business Continuity Site
![Page 9: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/9.jpg)
Architecture & Features
Dataguard Processes
PrimaryDatabase
Transactions Physical/Logical StandbyDatabase
Backup /Reports
Transform Redo to SQL for SQL Apply
MRP/ LSP
ARCHArchived Redo Logs
Archived Redo Logs
ARCH
Oracle Net
StandbyRedo Logs
RFS
FAL
Online Redo Logs
LGWRLNS
![Page 10: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/10.jpg)
Architecture & Features
Dataguard Features
• Several Protection Modes
– Maximum Protection
– Maximum Availability
– Maximum Performance
• Several Transport Modes
– LGWR SYNC
– LGWR ASYNC
– ARCH
![Page 11: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/11.jpg)
Creating a Standby
Prepare Primary & Standby
• Prepare Primary Database
– Enable Force Logging
SQL> alter database force logging;
– Modify initialization parameters
• Prepare Standby Database
– Setup directory structure
– Create spfile with correct parameters
– Start database in nomount
![Page 12: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/12.jpg)
Creating a Standby
Log Transport Parameters
• LOG_ARCHIVE_CONFIG='DG_CONFIG=(PRIMARY, STANDBY)'
• LOG_ARCHIVE_DEST_1='LOCATION=/var/oracle/PRIMARY/arch'
• LOG_ARCHIVE_DEST_2='SERVICE=PRIMARC DB_UNIQUE_NAME=PRIMARY'
• LOG_ARCHIVE_DEST_3='SERVICE=STANDBY LGWR ASYNCREOPEN=15 MAX_FAILURE=10 OPTIONAL
VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=STANDBY'
![Page 13: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/13.jpg)
Creating a Standby
ssh tunnels
• You may not wish your redo data being sent unencrypted across the internet to your standby. You can use ssh tunnels to avoid this
– ssh -N -L 3333:standby:1521 oracle@standby
• Now the tnsnames entry points to the localhost
STANDBYARC =
(DESCRIPTION =
(SDU = 32767)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT=3333)))
(CONNECT_DATA =
(SERVICE_NAME = STANDBY)))
![Page 14: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/14.jpg)
Creating a Standby
Some Other Parameters
• FAL_SERVER
• FAL_CLIENT
• ARCHIVE_LAG_TARGET
• STANDBY_FILE_MANAGEMENT
• DB_FILE_NAME_CONVERT
• LOG_FILE_NAME_CONVERT
![Page 15: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/15.jpg)
Creating a Standby
backup your primary
• Backup primary - rman is good
– rman> backup format '/backup/%U' database plus archivelog;
– rman> backup format '/backup/%U' current controlfile for standby;
• Recover backup on standby node
– I like using rman duplicate to create standby:
• (oracle$) rman target sys/password@PRIMARY auxiliary /
• rman> duplicate target database for standby;
![Page 16: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/16.jpg)
Creating a Standby
Start applying redo
• Create standby redo log files on both primary and standby:
– sql> alter database add standby logfile thread 2 group 42 (’PATH_TO_DATA/standbyredo01.log') size 512M;
• Now you can start the physical standby recovering logs:
– sql>alter database recover managed standby database disconnect from session;
• Or if you prefer real time apply:
– sql>alter database recover managed standby database using current logfile disconnect from session;
![Page 17: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/17.jpg)
Maintaining your standby
Monitoring the Standby
• You have to ensure your standby is keeping up with your primary
• You can check which was the last log to have been applied to your standby is
– sql> SELECT MAX(SEQUENCE#), THREAD# FROM V$ARCHIVED_LOG where APPLIED='YES' GROUP BY THREAD#;
MAX(SEQUENCE#) THREAD#
-------------- ----------
2976 1
1888 2
![Page 18: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/18.jpg)
Maintaining your standby
Monitoring Standby Progress
• A good way of checking what the background processes of your standby are up to is using v$managed_standby
– SQL> select process, sequence#, status
from V$managed_standby;
PROCESS SEQUENCE# STATUS
-------- ---------- ------------
ARCH 2967 CLOSING
ARCH 2974 CLOSING
RFS 2977 IDLE
MRP0 1889 APPLYING_LOG
RFS 1889 IDLE
RFS 2977 IDLE
![Page 19: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/19.jpg)
Maintaining your standby
Monitoring Your Standby
• You have to ensure your standby is keeping up with your primary
• V$DATAGUARD_STATS provides useful information
– SQL> select name, value from v$dataguard_stats;
NAME VALUE
-------------------------------- ------------------------------------
apply finish time +00 00:00:00
apply lag +00 00:00:11
estimated startup time 41
standby has been open N
transport lag +00 00:00:03
![Page 20: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/20.jpg)
Maintaining your standby
Monitoring Your Standby
• A way of finding out what has been happening to your standby over a period time is to look at the v$dataguard_status view
– Log Apply Services 01-AUG-07 Media Recovery Waiting for thread 1 sequence 2977 (in transit)
– Log Apply Services 01-AUG-07 Media Recovery Waiting for thread 1 sequence 2977 (in transit)
– Log Apply Services 01-AUG-07 Media Recovery Waiting for thread 2 sequence 1889 (in transit)
– Remote File Server 01-AUG-07 Primary database is in MAXIMUM PERFORMANCE mode
– Remote File Server 01-AUG-07 RFS[53]: Successfully opened standby log 14: '+DATA2/standby/standbyredo02.log'
![Page 21: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/21.jpg)
Maintaining your standby
Oracle can’t divide by 0
• Standby was happily working away
– ORA-07445: exception encountered: core dump [kcrarmb()+152] [SIGFPE] [Integer divide by zero] [0x00085C300
• MRP process crashes
– No redo gets applied from this point
• Logs after the one that caused the ORA-07445 still being shipped
• A simple restart of the managed recovery process does a FAL and the standby is back up-to-date
![Page 22: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/22.jpg)
Maintaining your standby
kcrfr_resize2
• Lots of problems after upgrade to 10.2.0.3
– Recovery of Online Redo Log: Thread 2 Group 23 Seq 999 Reading mem 0
Mem# 0: +DATA3/standby/standbyredo11.log
ORA-00600: internal error code, arguments: [kcrfr_resize2], [652614828032], [268423168], [], [], [], [], []
• Perhaps caused by the following:
– Bug 3306010 OERI[kcrfr_resize2] possible in MEDIA recovery
Media recovery may fail with ORA-600 [kcrfr_resize2] when
the number of redo strands is set to a high value using
log_parallelism.
![Page 23: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/23.jpg)
Maintaining your standby
kcrfr_resize2
• This issue has recently been published as Note:453259.1
– Triggered by having a large log_buffer
• This bug affects 10.2.0.3 and potentially 9.2.0.8
• It is related to the size of the log_buffer parameter
• Fix is included in 10.2.0.4
![Page 24: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/24.jpg)
Maintaining your standby
kcrrupirfs
• ARC processes died on primary:
ORA-00600: [kcrrupirfs.20] [4] [368]
• Trace file showed the following:
Corrupt redo block 479421 detected: bad block number
Flag: 0x0 Format: 0x0 Block: 0x00000000 Seq: 0x00000000 Beg: 0x0 Cks:0x0 <<<<<<<--
----- Dump of Corrupt Redo Buffer -----000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
![Page 25: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/25.jpg)
Maintaining your standby
kcrrupirfs
• Oracle think initially think this ORA-600 error was hardware related
– There are NO indications of any hardware fault - the primary keeps running
• After a couple of weeks it was decided this was a “bug situation”
– This was bug 4767278 which talked about FAL not being able to read from multiple mirror sides when encountering invalid/stale redo in a file. Apparently required for ASM configurations because ASM does not guarantee all mirror sides contain same data after writing.
– We were using ASM, but external redundancy
– Oracle then said “The ASM group is not 100% sure if the patch 4767278 will fix the problem”
![Page 26: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/26.jpg)
Maintaining your standby
log corruption
• The Managed Recovery process crashed complaining about log corruption
MRP0: Background Media Recovery terminated with error 355
ORA-00355: change numbers out of order
ORA-00353: log corruption near block 2 change 1273622545 time 03/06/2007 08:32:46
ORA-00312: online log 13 thread 1: '+DATA2/standby/standbyredo01.log'
• Oracle blame the upgrade process at first. They suggest rebuilding the standby
• Then I notice that trying managed recovery rather than real time apply seems to allow the standby to progress
![Page 27: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/27.jpg)
Maintaining your standby
log corruption
• At this point Oracle say “it looks like a bug”
• Lots of time spent diagnosing the issue
– ALTER SYSTEM DUMP LOGFILE '+DATA2/nom/standby33.log' scn min 865465290 scn max 865465300;
• Eventually Oracle produced a patch 5746174
– MRP HANGS WITH ASYNC LNS AND PARALLEL ARCHIVAL
![Page 28: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/28.jpg)
Using Your Standby
Utilize those cpu cycles
• A Standby can be considered an insurance policy
• Several ways to utilize your standby
– Run your backups from your standby
– Open your standby read only for reporting
– Flashback standby to look at old data
– Open your standby read write for testing purposes
![Page 29: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/29.jpg)
Using Your Standby
Open for Reports
• You need to cancel managed recovery
– sql> alter database recover managed standby database cancel;
• Then simply open the standby
– sql> alter database open;
• Redo is still transported to your standby
• To transition back to applying redo shutdown the open standby, startup mount and restart the recovery process
![Page 30: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/30.jpg)
Using Your Standby
Open for read write
• You must have flashback database enabled for this
• Stop redo apply on standby
• Create a restore point
• Activate the Standby & perform read/write testing
• Flashback to restore point
• Start the redo on the Standby again
![Page 31: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/31.jpg)
Using Your Standby
Open for read write
Physical Standby Physical Standby
read write
RestorePoint
Flashback Database
Activate standby
![Page 32: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/32.jpg)
Using Your Standby
Flashback Database in a Nutshell
• Set up Flashback Database
– alter system set db_recovery_file_dest_size = 8G;
– alter system set db_recovery_file_dest = 'your flashback destination';
– alter system set db_flashback_retention_target = 1440 ;
– alter database flashback on;
• Once you have cancelled the standby recovery create a guaranteed restore point
– create guaranteed restore point before_activate;
•
![Page 33: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/33.jpg)
Using Your Standby
Open for read write
• Activate your Standby
– SQL> ALTER DATABASE ACTIVATE STANDBY DATABASE;
• You can open the Standby for business
– SQL> ALTER DATABASE OPEN;
• To become a Standby again shutdown and startup in mount
– SQL> FLASHBACK DATABASE TO RESTORE POINT BEFORE_ACTIVATE;
– SQL> ALTER DATABASE CONVERT TO PHYSICAL STANDBY;
![Page 34: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/34.jpg)
Using Your Standby
Open for read write
• However things never go according to plan
– ORA-00600: internal error code, arguments: [3705], [1], [8], [3], [8], [], []
• This was bug 4479323 which is a bug with recovery (not standby specific) and only occurs in a RAC environment
• This is fixed in 10.2.0.3
![Page 35: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/35.jpg)
Doing a Switchover
It’s good to test
• A business continuity plan is no good unless it’s been tested
• It’s not all about the database
• Good to think in terms of services
![Page 36: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/36.jpg)
Doing a Switchover
Database Switchover
• Make sure your standby is up-to-date
• Check your primary database switchover status:
– primary> SELECT SWITCHOVER_STATUS FROM V$DATABASE;
• Switchover primary database
– primary> ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY with session shutdown;
• Switchover the standby
– standby> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY with session shutdown;
![Page 37: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/37.jpg)
Doing a switchover
DNS Primer
• DNS allows translation from hostname to IP address
– example.co.uk IN A 162.0.0.1
• Our principle is all services are accessed through a CNAME
– anexample.co.uk 5M IN CNAME example.co.uk
• relocation of the service is just a case of changing where the CNAME points
![Page 38: Adventures in Dataguard](https://reader035.fdocuments.in/reader035/viewer/2022062220/555acaded8b42ab1128b5481/html5/thumbnails/38.jpg)
Conclusion
Conclusion
• Dataguard is an efficient DR solution for your primary database
• Dataguard is mostly reliable but is not without it’s blips
• There are opportunities for gaining added value from your standby
• You can’t test your Business continuity plan enough