Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

24
ORACLE GOLDENGATE BEST PRACTICES: ORACLE GOLDENGATE 11GR2 INTEGRATED EXTRACT AND ORACLE DATA GUARD - SWITCHOVER/F AIL-OVER OPERATIONS Version 1.1a Document ID: 1436913.1 Date: November 25, 2014 Sourav Bhattacharya Center of Excellence

description

Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Transcript of Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 1: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

ORACLE GOLDENGATE BEST PRACTICES: ORACLE

GOLDENGATE 11GR2 INTEGRATED EXTRACT AND ORACLE

DATA GUARD - SWITCHOVER/FAIL-OVER OPERATIONS

Version 1.1a

Document ID: 1436913.1

Date: November 25, 2014

Sourav Bhattacharya

Center of Excellence

Page 2: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Table of Contents 1 Document Acceptance ............................................................................................................... 3

Reviewers: ........................................................................................................................................................................ 3 Change Control:.............................................................................................................................................................. 3

2 Overview ...................................................................................................................................... 4

3 System Configuration in GoldenGate-Data Guard installations ...................................... 4 Oracle Data Guard Protection levels .......................................................................................................................... 5

4 Administrative considerations and assumtions .................................................................... 6 Risks.................................................................................................................................................................................. 6 Responsibilities ............................................................................................................................................................... 6

5 Administrative scenarios involved in GoldenGate-Data Guard installations ................ 8

6 Scenario 1: Switchover/Failover using a shared storage .................................................. 10 Source Database Switchover/Failover to Standby System .................................................................................... 10

6.1.1 Shared storage switchover/failover ......................................................................................................................... 11 6.1.2 Non shared-storage switchover/failover ................................................................................................................ 13

7 System clock .............................................................................................................................. 19

8 Conclusion ................................................................................................................................. 20

9 Appendix .................................................................................................................................... 21 Appendix A – gg_11gie_ext_shared.sh .................................................................................................................... 21 Appendix B – gg_11gie_ext_non_shared.sh ........................................................................................................... 22 Appendix E –action.sh (ONLY for RAC) ............................................................................................................... 24

Page 3: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 3 of 24

1 Document Acceptance

Reviewers:

Reviewer Title Email

Tracy West Consulting Member of Technical Stuff

[email protected]

Change Control:

Date Change Changed by

2/29/2012 Example for externaljob.ora update

Sourav

3/1/2012 Updated in line with the best practices document for "classic extract switchover / failover with data guard."

Sourav

5-1-12 Formatting corrections SGeorge

Page 4: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 4 of 24

2 Overview

Oracle GoldenGate is Oracle's strategic replication solution for data distribution and data

integration. Oracle Data Guard is Oracle’s strategic Disaster Recovery (DR) solution for the

Oracle Database. It is common for customers to deploy both capabilities within the same

configuration given the important role of each technology. For example, GoldenGate capture

processes can be deployed on a source database that is also a primary or a standby database in a

Data Guard configuration. Likewise, a database that is a target for GoldenGate replication can

also be a primary database protected by a Data Guard standby. This document provides best

practices for managing such a configuration when Data Guard role transitions are necessary

(switchover or failover operations where the primary and standby databases reverse roles).

The configuration described by this document contains two databases:

A Data Guard primary database – that can also be either a target or source for

GoldenGate replication

A Data Guard physical standby database - an exact replica of the primary database

maintained by Data Guard for disaster recovery.

Data Guard offers three distinct protection levels - each having different data protection and

performance characteristics - to provide customers the flexibility to address different

requirements. Only Data Guard Maximum Protection level can provide an absolute guarantee of

zero data loss. Other Data Guard protection levels (Maximum Availability and Maximum

Performance) have the potential for data loss either because they are not designed to protect

against multiple failure events or because data is transmitted asynchronously between primary

and standby databases. This document lays the groundwork to describe the step by step process

to integrate GoldenGate integrated extract with Data Guard so that GoldenGate processes can

also be switched-over/failed-over automatically with no manual intervention as part of Data

Guard switchover/failover. The switchover/failover is actually done by a UNIX shell program

which is triggered by the database system event "DB_ROLE_CHANGE" when

switchover/failover occurs.

3 System Configuration in GoldenGate-Data Guard installations

As shown in Figure 1-1, data from a Data Guard primary database is replicated to a target

database by GoldenGate. Data Guard maintains synchronization of the physical standby database

used for disaster recovery. .. The only time that GoldenGate will use the standby database to

send data to the target database is after a Data Guard switchover or failover operation has

Page 5: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 5 of 24

promoted the standby database to the primary role. In this case, the application is moved to the

new primary database and it becomes the new source for the GoldenGate target.

Figure 1-1

Oracle Data Guard Protection levels

The following descriptions summarize the three levels of Data Guard protection.

Maximum Protection This protection level uses a synchronous replication process to ensure

that no data loss will occur if the primary database fails. It also enforces rules that prevent

multiple failure events from causing data loss. This protection level will never allow a primary

database to acknowledge commit success for an unprotected transaction.To provide this level of

protection, the redo data that is needed to recover each transaction must be written to both the

local online redo log and to the standby redo log on at least one standby database before Oracle

can acknowledge commit success to the application. To ensure that data loss cannot occur, the

primary database will shut down if a fault prevents it from writing its redo stream to the standby

redo log of at least one standby database.

Maximum Availability This protection level uses a synchronous replication process that

provides zero data loss protection without compromising the availability of the primary

database. Like Maximum Protection, commit success is not acknowledged to the application

Page 6: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 6 of 24

until the redo that is needed to recover that transaction is written to the local online redo log and

to the standby redo log of at least one standby database. Unlike Maximum Protection, however,

the primary database does not shut down if a fault prevents it from writing its redo stream to a

remote standby redo log. The primary database will stall for a maximum of net_timout seconds

(user configurable) before proceeding, in order to maintain availability of the primary database.

Data Guard automatically resynchronizes primary and standby databases when the connection is

restored. Data loss is possible if a second failure occurs before the resynchronization process is

complete.

Maximum performance This protection mode (the default) is an asynchronous replication

process that provides the highest level of data protection that is possible without affecting the

performance of the primary database. This is accomplished by acknowledging commit success

as soon as the redo data that is needed to recover that transaction is written to the local online

redo log without waiting for confirmation by the standby that the data is protected. The redo data

stream of the primary database is transmitted to the standby database directly from the Oracle in-

memory log buffer as quickly as it is generated. .

4 Administrative considerations and assumtions

Risks

When using any Data Guard protection level other than Maximum Protection, there is always the

risk of data loss whenever an unplanned failover occurs. . This risk transfers to the GoldenGate

target as well, because:

The Data Guard primary, the standby, and the GoldenGate target might all be at different

sequence numbers in the transaction stream.

The GoldenGate target database could be ahead or behind the Data Guard standby database in

time given that there is no way to insure that GoldenGate replication and Data Guard redo

transport are in lock-step.

Responsibilities

1. If using Maximum Protection or Maximum Availability, confirm that the failover was a

zero data loss failover. This is done before the failover is executed, by confirming the

values for PROTECTION_MODE and PROTECTION_LEVEL in V$DATABASE. If

the values both match (e.g. mode and level each report Maximum Availability), there is

no data loss.

2. If there was no data loss, then:

Once GoldenGate finishes processing the source data that has accumulated in the

trails (from the original primary database), it can start processing any new

Page 7: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 7 of 24

transactions from the new primary database (previously the standby database)

without any data loss.

This document outlines the steps that the shell program will perform in case of

switchover.

3. If you were using Maximum Performance, or if you were using Maximum Availability

and the PROTECTION_LEVEL in V$DATABASE says UNSYNCHRONIZED

(indicating that an earlier outage had impacted Data Guard transport) then the failover

will result in data loss (Data Guard was not able to transmit all committed transactions to

the standby database before the primary failed). If you can still access the original

primary after failover, determine how much and which data was lost, and how you want

to resolve the problem.

If the source and target databases are both Oracle databases, you can use

GoldenGate Veridata to identify the out-of-sync data.

This document outlines the steps that the shell program will perform in case of

failover.

In order to greatly simplify GoldenGate recovery it is strongly recommended you keep the

GoldenGate checkpoint files (in <GoldenGate home>/dirchk) as well as your trail files (at least

the most recent one(s)) on a storage device that can be accessed by the standby system.

Page 8: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 8 of 24

5 Administrative scenarios involved in GoldenGate-Data Guard installations This section outlines a sample configuration where GoldenGate and Data Guard work concurrently on the same systems. Following this section are discussions about the typical administrative scenarios that are involved when using GoldenGate and Data Guard in the same environment. The following is the configuration that is used in the discussions:

- System A is the primary Data Guard system. - System B is the standby Data Guard system. - Source and Target systems

The following are assumptions on which the discussions are based:

Scenario 1: Switchover In a graceful, planned database switchover, no data loss will occur. Scenario 2: Failover In a site failure, data loss occurs if System B has not received all transactions that were committed at the primary database before it failed. If extract was behind on the source system (primary database) it can be positioned to recover data in the archive log files on the standby system.

Assumptions and prerequisites:

The Oracle version should be at least be 11.2.0.3 or higher (As the integrated extract in

primary database in only supported from this version onwards) which has the fix for bug

13560925.

The shell script currently handles one extract one extract pump and one replicat. The

script needs to be updated if there are more extracts/replicats to handle in your

environment or you can use AUTOSTART parameter in GoldenGate manager file which

with startup the extract processes automatically upon startup.

We have assumed that the extract pump is running in "PASSTHRU" mode i.e. no

database connection. If that is not true then the script needs to be updated accordingly.

As part of the Failover/switchover the script will shell out "SSH" sessions between

primary/standby servers. So please make sure that private/public keys are generated so

that "SSH" between servers is possible without password. This is essential for the script

to work.

Before creating the "failover_actions" trigger that executes the switchover/failover script

please update the "run_user" and run"group" of the

$ORACLE_HOME/rdbms/admin/externaljob.ora file to the correct OS user and OS

group.

For example:

Page 9: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 9 of 24

*****************************************************************************

$ (coe-01)[dgp] /home/oracle/profile\> vi $ORACLE_HOME/rdbms/admin/externaljob.ora

# NOTES

# For Porters: The user and group specified here should be a lowly privileged

# user and group for your platform. For Linux this is nobody

# and nobody.

# MODIFIED

# rramkiss 12/09/05 - Creation

#

##############################################################################

# External job execution configuration file externaljob.ora

#

# This file is provided by Oracle Corporation to help you customize

# your RDBMS installation for your site. Important system parameters

# are discussed, and default settings given.

#

# This configuration file is used by dbms_scheduler when executing external

# (operating system) jobs. It contains the user and group to run external

# jobs as. It must only be writable by the owner and must be owned by root.

# If extjob is not setuid then the only allowable run_user

# is the user Oracle runs as and the only allowable run_group is the group

# Oracle runs as.

run_user = oracle

run_group = oinstall

*****************************************************************************

For RAC after switchover/failover the action script can be fired by any node of the new

primary. But the action script works only on a designated server. To handle this situation

the action trigger would fire the "action.sh" script (appendix E ) which would SSH to the

designated server and would execute the actual failover script.

This is for the "Long distance switchover" scenario where we use Oracle DBFS file

system to store the GoldenGate binaries. The OS user (in our case "oracle") who actually

does the failover should have "sudo" access and the "/etc/sudoers" file should be updated

with "NOPASSWD:" option so that the script can unmount the DBFS file system in the

old primary server as part of the switchover/failover operation. Also the parameter

"Defaults requiretty" in the "/etc/sudoers" should be commented out (in both primary

and standby host) so that the remote shell can execute the "fusermount" via sudo.

#Defaults requiretty

oracle ALL=NOPASSWD: ALL

Page 10: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 10 of 24

6 Scenario 1: Switchover/Failover using a shared storage

Important! Do not follow these steps on your systems. They are only an outline of the steps that

should be considered as part of a switchover procedure. The procedures in this document cannot

be applied to individual environments without in-depth analysis of the systems, databases, and

applications involved. Make sure to test the approach you plan to use so that you are aware of all

the steps involved.

Source Database Switchover/Failover to Standby System

There are 2 different scenarios for the switchover/failover. The first scenario assumes that the

checkpoint files and trail files are available to the standby system. We call this scenario "shared

storage switchover/failover". The second scenario assumes that checkpoint and trail files are not

shared but they are installed in a DBFS file system switch is part of the primary database. We

call this scenario "long-distance switchover/failover".

Page 11: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 11 of 24

6.1.1 Shared storage switchover/failover

This scenario assumes checkpoint and trail files between source and standby database server are

or can be shared through the user of shared storage. Shared checkpoint files implies the same

name between the extract processes on primary and standby. The file structures (or if you use

relative notations for files, e.g. ./dirdat) relative to the GoldenGate home directory have to match

between the primary and the standby system. On Unix or Linux-based systems you can use soft

links to achieve this.

The scenario below assumes shared storage that shared storage is in place.

System setup:

System A (primary): Extract A ------> DataPump A-----> (remote) Replicat A

System B (standby): Extract A ------> DataPump A-----> (remote) Replicat A

During regular processing only the GoldenGate processes on source System A are active (not on

System B).

Page 12: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 12 of 24

Note that Extract A and Data Pump A on both systems share checkpoint files and trail files (and

the parameter files are identical except for maybe environment settings). Replicat A runs on the

target environment which is not involved in the switchover/failover.

Switchover/failover Steps and Procedure outline:

Get the script "gg_11gie_ext_shared.sh" from appendix A and update the variables (like

OGG_HOME etc) located in the beginning of the script to suite your environment.

Create the following trigger in the primary database

#Single instance

CREATE OR REPLACE TRIGGER failover_actions AFTER DB_ROLE_CHANGE ON DATABASE

DECLARE

role VARCHAR(30);

BEGIN

SELECT DATABASE_ROLE INTO role FROM V$DATABASE;

IF role = 'PRIMARY' THEN

dbms_scheduler.create_job (

job_name => '<schema_name>.<job_name>',

job_type => 'EXECUTABLE',

job_action => '<path>/gg_11gie_ext_shared.sh',

enabled => TRUE);

END IF;

END;

#RAC

CREATE OR REPLACE TRIGGER failover_actions AFTER DB_ROLE_CHANGE ON DATABASE

DECLARE

role VARCHAR(30);

BEGIN

SELECT DATABASE_ROLE INTO role FROM V$DATABASE;

IF role = 'PRIMARY' THEN

dbms_scheduler.create_job (

job_name => '<schema_name>.<job_name>',

job_type => 'EXECUTABLE',

job_action => '<path>/action.sh',

enabled => TRUE);

END IF;

END;

The trigger will get executed in the event of a switchover or failover and the primary node will

create and execute a database job which will in turn run the shell program. (For RAC the script

"action.sh" will be executed by the trigger which will in turn login to the designated OGG server

and will execute the main action script.) The script will first determine if the "role change" is due

to switchover or a failover.

Page 13: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 13 of 24

In this case the program will execute the following steps

1. Login to the old primary (from where the database switched over) and stop the manager

and the pump process.

2. Start the GoldenGate manager, extract and extract pump in the new primary server.

Note: No other changes are required. You do not have to change or even restart replicat on the

target environment. Both extract and pump will resume processing at their respective recovery

and write checkpoints and there are no changes required to the processes.

6.1.2 Non shared-storage switchover/failover

This scenario assumes that it is not feasible to install GoldenGate binaries, checkpoint files and

trail files in a shared files system between the primary and the standby environment. So in this

scenario the recommended approach is to create a DBFS file system as part of the primary

database and install GoldenGate in that file system.

PLEASE NOTE THAT THERE CAN BE OVERHEAD ON THE PRIMARY DATABASE

FOR THE DBFS FILE SYSTEM BEING PART OF IT.

System setup:

Extract A ------> DataPump A-----> Replicat A

Extract B ------> DataPump B-----> Replicat B

Page 14: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 14 of 24

Creating a DBFS File System

First we must create a tablespace to hold the file system.

CONN / AS SYSDBA

CREATE TABLESPACE dbfs_ts

DATAFILE '/u01/app/oracle/oradata/DB11G/dbfs01.dbf'

SIZE 1M AUTOEXTEND ON NEXT 1M;

Next, we create a user, grant DBFS_ROLE to the user and make sure it has a quota on the

tablespace. Trying to create a file system from the SYS user fails, so it must be done via another

user.

CONN / AS SYSDBA

CREATE USER dbfs_user IDENTIFIED BY dbfs_user

DEFAULT TABLESPACE dbfs_ts QUOTA UNLIMITED ON dbfs_ts;

GRANT CREATE SESSION, RESOURCE, CREATE VIEW, DBFS_ROLE TO dbfs_user;

Page 15: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 15 of 24

Next we create the file system in tablespace by running the "dbfs_create_filesystem.sql" script as

the test user. The script accepts two parameters identifying the tablespace and file system name.

cd $ORACLE_HOME/rdbms/admin

sqlplus dbfs_user/dbfs_user

SQL> @dbfs_create_filesystem.sql dbfs_ts staging_area

The script created a partitioned file system. Although Oracle consider this the best option from a

performance and scalability perspective, it can have two drawbacks:

Space cannot be shared between partitions. If the size of the files is small compared to the

total file system size this is not a problem, but if individual files form a large proportion

of the total file system size, then ENOSPC errors may be produced.

File rename operations may require the file to be rewritten, which can be problematic for

large files.

If these issues present a problem to you, you can create non-partitioned file systems using the

"dbfs_create_filesystem_advanced.sql" script. In fact, the "dbfs_create_filesystem_advanced.sql"

script is called by the "dbfs_create_filesystem.sql" script, which defaults many of the advanced

parameters.

If we later wish to drop a file system, this can be done using the "dbfs_drop_filesystem.sql"

script with the file system name.

cd $ORACLE_HOME/rdbms/admin

sqlplus dbfs_user/dbfs_user

SQL> @dbfs_drop_filesystem.sql staging_area

FUSE Installation

In order to mount the DBFS we need to install the "Filesystem in Userspace" (FUSE) software.

If you are not planning to mount the DBFS or you are running on an Non-Linux platform, this

section is unnecessary. The FUSE software can be installed manually, from the OEL media or

via Oracle's public yum server. If possible, use the Yum installation.

Yum FUSE Installation

First, configure the server to point to Oracle's public yum repository. The instructions for this are

available at "http://public-yumn.oracle.com".

Next, install the kernel developent package. It may already be present, in which case you will see

a "Nothing to do" message.

# yum install kernel-devel

Page 16: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 16 of 24

Finally, install the FUSE software.

# yum install fuse fuse-libs

First we need to create a mount point with the necessary privileges as the "root" user.

# mkdir /mnt/dbfs

# chown oracle:oinstall /mnt/dbfs

Next, add a new library path.

# echo "/usr/local/lib" >> /etc/ld.so.conf.d/usr_local_lib.conf

Create symbolic links to the necessary libraries in the directory pointed to by the new library

path. Note. Depending on your installation the "libfuse.so.2" library may be in an alternative

location.

# export ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1

# ln -s $ORACLE_HOME/lib/libclntsh.so.11.1 /usr/local/lib/libclntsh.so.11.1

# ln -s $ORACLE_HOME/lib/libnnz11.so /usr/local/lib/libnnz11.so

# ln -s /lib64/libfuse.so.2 /usr/local/lib/libfuse.so.2

Issue the following command.

# ldconfig

The file system we've just created is mounted with the one of the following commands from the

"oracle" OS user.

$ # Connection prompts for password and holds session.

$ dbfs_client dbfs_user@DB11G /mnt/dbfs

$ # Connection retrieves password from file and releases session.

$ nohup dbfs_client dbfs_user@DB11G /mnt/dbfs < passwordfile.f &

$ # Connection authenticates using wallet and releases session.

$ nohup dbfs_client -o wallet /@DB11G_DBFS_USER /mnt/dbfs &

The wallet authentication is the safest method as the others potentially expose the credentials.

Creation of a wallet is discussed later, but it is part of the Advanced Security option.

Once mounted, the "staging_area" file system is now available for use.

# ls -al /mnt/dbfs

total 8

drwxr-xr-x 3 root root 0 Jan 6 17:02 .

drwxr-xr-x 3 root root 4096 Jan 6 14:18 ..

drwxrwxrwx 3 root root 0 Jan 6 16:37 staging_area

# ls -al /mnt/dbfs/staging_area

total 0

drwxrwxrwx 3 root root 0 Jan 6 16:37 .

Page 17: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 17 of 24

drwxr-xr-x 3 root root 0 Jan 6 17:02 ..

drwxr-xr-x 7 root root 0 Jan 6 14:00 .sfs

#

To unmount the file system issue the following command from the "root" OS user.

# fusermount -u /mnt/dbfs

Switchover/failover Steps and Procedure outline:

Get the script "gg_11gie_ext_non_shared.sh" from appendix B and update the variables

(like OGG_HOME etc) located in the beginning of the script to suite your environment.

Create the following trigger in the primary database

#Single instance

CREATE OR REPLACE TRIGGER failover_actions AFTER DB_ROLE_CHANGE ON DATABASE

DECLARE

role VARCHAR(30);

BEGIN

SELECT DATABASE_ROLE INTO role FROM V$DATABASE;

IF role = 'PRIMARY' THEN

dbms_scheduler.create_job (

job_name => '<schema_name>.<job_name>',

job_type => 'EXECUTABLE',

job_action => '<path>/gg_11gie_ext_non_shared.sh',

enabled => TRUE);

END IF;

END;

#RAC

CREATE OR REPLACE TRIGGER failover_actions AFTER DB_ROLE_CHANGE ON DATABASE

DECLARE

role VARCHAR(30);

BEGIN

SELECT DATABASE_ROLE INTO role FROM V$DATABASE;

IF role = 'PRIMARY' THEN

dbms_scheduler.create_job (

job_name => '<schema_name>.<job_name>',

job_type => 'EXECUTABLE',

job_action => '<path>/action.sh',

enabled => TRUE);

END IF;

END;

Page 18: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 18 of 24

The trigger will get executed in the event of a switchover or failover and the primary node will

create and execute a database job which will in turn run the shell program. (For RAC the script

"action.sh" will be executed by the trigger which will in turn login to the designated OGG server

and will execute the main action script.)The script will first determine if the "role change" is due

to switchover or a failover.

In this case the program will execute the following steps

1. Mount the DBFS file system in the new primary

2. Login to the old primary (from where the database switched over) and kill the manager,

pump process and the unmount the DBFS file system.

3. Start the GoldenGate manager, extract and extract pump in the new primary server.

Page 19: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 19 of 24

7 System clock

Timing between environments: Having synchronized system clocks is essential for replication. The best approach is to synchronize the system clocks between all environments using ntp or some other time synchronization protocol.

Page 20: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 20 of 24

8 Conclusion

The goal of this document was to share an approach that automates the failover of OGG when used in a Fast Failover Data Guard environment. Please keep in mind that while this solution has been well tested for a very basic use case you most likely will need to modify it to meet your specific needs. Furthermore, since our goal is to share a method, it is your responsibility to certify that any solution you implement meets your specific business requirements as no warranty or support can be provided for your implementation of this method. When configuring GoldenGate in an environment that has either the source or target databases protected by Data Guard, you must plan for the possibility of data loss should a Data Guard failover occur. If there is data loss due to the failure of the primary database, you need to decide how to handle the differences between the GoldenGate source and the target databases.

Page 21: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 21 of 24

9 Appendix

Appendix A – gg_11gie_ext_shared.sh

This appendix contains a Linux/Unix script that will perform switchover/failover of GoldenGate

extract/extract pump when triggered by "DB_ROLE_CHANGE" system event. This script is

designed to work in an environment where GoldenGate is installed in a shared file system

between the primary and the standby db server. Please change variables in the beginning of the

script to suite your environment.

Following is the script.

#!/bin/sh

#Set Variables

OGG_HOME=<GoldenGate home directory path>

FAL_NODE1=<primary node name>

FAL_NODE2=<standby node name>

PROFILE_NODE1=<primary node profile script name>

PROFILE_NODE2=<standby node profile script name>

NODE1_HOME=<primary node home directory where profile script resides>

NODE2_HOME=<standby node home directory where profile script resides>

extract=<extract name>

pump=<pump name>

V_WAIT_FOR_ARCHIVE=<time in seconds to wait for archivelog after

failover occurs>

#Set DB profile

v_host=`hostname`

if [ "$v_host" = "$FAL_NODE1" ]

then

cd $NODE1_HOME

. ./$PROFILE_NODE1

else

cd $NODE2_HOME

. ./$PROFILE_NODE2

fi

v_host=`hostname`

if [ "$v_host" = "$FAL_NODE1" ]

then

#Switchover steps

#Remote connection to stop mgr/pump in the failed node

ssh "$FAL_NODE2">/dev/null 2>&1 ". ./$PROFILE_NODE2;$OGG_HOME/ggsci

<<EOFF

stop $pump

stop mgr!

exit

EOFF"

sleep V_WAIT_FOR_ARCHIVE

$OGG_HOME/ggsci <<EOFF

start mgr

sh sleep 2

Page 22: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 22 of 24

start $extract

start $pump

exit

EOFF

exit 0

else

#Remote connection to stop mgr/pump in the failed node

ssh "$FAL_NODE1">/dev/null 2>&1 ". ./$PROFILE_NODE1;$OGG_HOME/ggsci

<<EOFF

stop $pump

stop mgr!

exit

EOFF"

sleep V_WAIT_FOR_ARCHIVE

$OGG_HOME/ggsci <<EOFF

start mgr

sh sleep 2

start $extract

start $pump

exit

EOFF

exit 0

fi

Appendix B – gg_11gie_ext_non_shared.sh

This appendix contains a Linux/Unix script that will perform switchover/failover of GoldenGate

extract/extract pump when triggered by "DB_ROLE_CHANGE" system event. This script is

designed to work in an environment where GoldenGate is installed in a DBFS file system which

is a part of the primary database.

Note: Please don’t forget to save the DBFS db user password in a text file and point to that file

with fully qualified path in the script. If you have security issues then use Oracle wallet

authentication.

Following is the script.

Page 23: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 23 of 24

#!/bin/sh

#Set Variables

OGG_HOME=<GoldenGate home directory path>

DBFS_MNT=<DBFS mount point>

FAL_NODE1=<primary node name>

FAL_NODE2=<standby node name>

PROFILE_NODE1=<primary node profile script name>

PROFILE_NODE2=<standby node profile script name>

NODE1_HOME=<primary node home directory where profile script resides>

NODE2_HOME=<standby node home directory where profile script resides>

extract=<extract name>

pump=<extract pump name>

TNS_NODE1=<tns connect string for primary db>

TNS_NODE2=<tns connect string for standby db>

DBFSUSER=<DBFS username>

syspassword=<Oracle sys user password for the target db>

#Password file should be created manually and should reside in

#the following directory

PASSWORDFILE_NODE1=$NODE1_HOME/<passwordfile name>

PASSWORDFILE_NODE2=$NODE2_HOME/<passwordfile name>

V_WAIT_FOR_ARCHIVE=<time in seconds to wait for archivelog after

failover occurs>

#Set DB profile and mount DBFS file system

v_host=`hostname`

if [ "$v_host" = "$FAL_NODE1" ]

then

cd $NODE1_HOME

. ./$PROFILE_NODE1

nohup dbfs_client $DBFSUSER@$TNS_NODE1 $DBFS_MNT <$PASSWORDFILE_NODE1

&

else

cd $NODE2_HOME

. ./$PROFILE_NODE2

nohup dbfs_client $DBFSUSER@$TNS_NODE2 $DBFS_MNT <$PASSWORDFILE_NODE2

&

fi

#Switchover/failover steps

echo "Switchover/failover steps"

v_host=`hostname`

if [ "$v_host" = "$FAL_NODE1" ]

then

#Remote connection to kill mgr/pump in the failed node

ssh "$FAL_NODE2">/dev/null 2>&1 "mgr_proc_id=\`ps -ef|grep

$OGG_HOME/dirprm/mgr.prm|grep -v grep|awk '{print \$2}'\`;

pump_proc_id=\`ps -ef|grep $OGG_HOME/dirprm/${pump}.prm|grep -v

grep|awk '{print \$2}'\`;kill -9 \$mgr_proc_id;kill -9

\$pump_proc_id;sudo fusermount -u -z $DBFS_MNT"

sleep $V_WAIT_FOR_ARCHIVE

$OGG_HOME/ggsci <<EOFF

start mgr

Page 24: Oracle GoldenGate 11gr2 IE and Oracle DG- Switchover-Fail-over Ops v1.1-ID1436913.1

Page 24 of 24

sh sleep 2

start $extract

start $pump

exit

EOFF

else

#Remote connection to kill mgr/pump in the failed node

ssh "$FAL_NODE1">/dev/null 2>&1 "mgr_proc_id=\`ps -ef|grep

$OGG_HOME/dirprm/mgr.prm|grep -v grep|awk '{print \$2}'\`;

pump_proc_id=\`ps -ef|grep $OGG_HOME/dirprm/${pump}.prm|grep -v

grep|awk '{print \$2}'\`;kill -9 \$mgr_proc_id;kill -9

\$pump_proc_id;sudo fusermount -u -z $DBFS_MNT"

sleep $V_WAIT_FOR_ARCHIVE

$OGG_HOME/ggsci <<EOFF

start mgr

sh sleep 2

start $extract

start $pump

exit

EOFF

fi

Appendix E –action.sh (ONLY for RAC)

Following is the RAC script #!/bin/sh

OGG_HOST=<OGG designated server hostname or ip>

ssh "$OGG_HOST">/dev/null 2>&1 "<path>/<the action script name>