Oracle 10g Backup Guide

31
Oracle 10g Backup Guide: A Small County Government Approach Kevin Medlin, [email protected] Abstract Database backups are one of the most important parts of a database administrator’s job. Backup strategies need to be reviewed on a regular basis. Backups themselves need to be tested frequently. This document offers one approach to database backups for Oracle 10g databases on Windows 32 and 64 bit servers. Recovery Manager (RMAN) is employed as the primary backup application. RMAN is fast, flexible, and can compress the sometimes-large backup files. A Recovery Catalog is also used. Data Pump exports are used as a secondary backup application. All steps are automated, and scripts are provided with explanations in the document. We have Oracle database 10g installed on Windows 2003, 32 and 64 bit servers. We keep our databases up almost 24x7, but do perform server re-boots in the early Sunday morning hours. We also occasionally take the servers down at scheduled times for maintenance on some of those Sunday mornings for things such as Windows patching and Oracle patching. We use Recovery Manager (RMAN) to perform online backups of our databases. RMAN is the Oracle suggested method for performing backup and recovery on databases (Chien, 2005, p. 3). RMAN is a great tool, but since we use Oracle standard edition not all of the benefits are available (parallelism, block media recovery, mean time to recover (MTTR), among others). Still, we are able to work RMAN into our backup strategy in a major way. With RMAN backups, we have been able to “refresh” our test databases upon request from developers or management. We also perform data pump exports on Oracle database 10g databases as additional safety measures. We use a combination of Windows batch scripts, SQL scripts, RMAN scripts, and scheduled tasks to automate these operations. Our main goal is to keep things as uniform as possible across the servers in an attempt to

Transcript of Oracle 10g Backup Guide

Page 1: Oracle 10g Backup Guide

Oracle 10g Backup Guide: A Small County Government Approach Kevin Medlin, [email protected]

Abstract

Database backups are one of the most important parts of a database administrator’s job. Backup strategies need to be reviewed on a regular basis. Backups themselves need to be tested frequently. This document offers one approach to database backups for Oracle 10g databases on Windows 32 and 64 bit servers. Recovery Manager (RMAN) is employed as the primary backup application. RMAN is fast, flexible, and can compress the sometimes-large backup files. A Recovery Catalog is also used. Data Pump exports are used as a secondary backup application. All steps are automated, and scripts are provided with explanations in the document.

We have Oracle database 10g installed on Windows 2003, 32 and 64 bit servers. We keep our databases up almost 24x7, but do perform server re-boots in the early Sunday morning hours. We also occasionally take the servers down at scheduled times for maintenance on some of those Sunday mornings for things such as Windows patching and Oracle patching. We use Recovery Manager (RMAN) to perform online backups of our databases. RMAN is the Oracle suggested method for performing backup and recovery on databases (Chien, 2005, p. 3). RMAN is a great tool, but since we use Oracle standard edition not all of the benefits are available (parallelism, block media recovery, mean time to recover (MTTR), among others). Still, we are able to work RMAN into our backup strategy in a major way. With RMAN backups, we have been able to “refresh” our test databases upon request from developers or management. We also perform data pump exports on Oracle database 10g databases as additional safety measures. We use a combination of Windows batch scripts, SQL scripts, RMAN scripts, and scheduled tasks to automate these operations. Our main goal is to keep things as uniform as possible across the servers in an attempt to keep things simple. For the most part, we have been successful.

Our backup strategy is simple; take a weekly RMAN backup and archivelogs the rest of the week. Take a secondary backup of an export or data pump. Of course, there is much more to it than that. I will give the list of steps and then explain each one. In the explanations, there will be setup information, scripts, suggestions, etc. I recommend reading through this document entirely before beginning/updating/altering any current backup plans you currently have in place.

Our 9 steps for a great 10g nightly backup strategy are;

1.  Delete old log files and rename current logs.

2.  Delete all RMAN backup files.

Page 2: Oracle 10g Backup Guide

3.  Perform a level 0 (zero) RMAN backup.

4.  Create clone files.

5.  Create archivelog backup, which includes Recovery Catalog housekeeping.

6.  Delete data pump export files.

7.  Perform data pump export.

8.  Check logs for errors.

9.  Page and/or email short error description.

1. Delete old log files and rename current logs

This is performed every day. It is good practice to create a log file for all scripts. In step 8, I check all the logs for any errors so all current logs are renamed. When they are renamed, it is easy to tell which errors refer to old jobs. These all need to be deleted eventually so as to not create a space issue.

Code Listing 1:

qgrep -l rman D:\oracle\admin\common\backup\logs\* >> %LOGFILE%del /Q D:\oracle\admin\common\backup\logs\*.oldlog3 >> %LOGFILE%ren D:\oracle\admin\common\backup\logs\*.oldlog2 *.oldlog3 >> %LOGFILE%ren D:\oracle\admin\common\backup\logs\*.oldlog1 *.oldlog2 >> %LOGFILE%ren D:\oracle\admin\common\backup\logs\*.log *.oldlog1 >> %LOGFILE%

Tip: Try to automate log file cleanup. It’s hard to remember everything!

2. Delete all RMAN backup files

This is only performed before a level 0 RMAN backup. Our RMAN backups are performed to the X: drive, X:\RMAN to be exact. We size this drive to hold our level 0 backups, the archivelogs, the archivelog backups, and the data pump exports. Clearing out the RMAN files on a weekly basis assures that there will be enough space on the drive for the next week for backups.

Code Listing 2:

# Running these commands will show the files that will be deleted in the next step.FORFILES /p x:\rman /m *.bks /d -0 /c "CMD /C echo @PATH\@FILE @FDATE.@FTIME" >> %logfile%FORFILES /p x:\rman /m *cfile* /d -0 /c "CMD /C echo @PATH\@FILE @FDATE.@FTIME" >> %logfile%

# These commands perform the actual deletion.FORFILES /p x:\rman /d -0 /c "CMD /C del *.bks" >> %logfile%FORFILES /p x:\rman /d-0 /c "CMD /C del *cfile*" >> %logfile%

Page 3: Oracle 10g Backup Guide

Tip: Definitely automate deletion of large files on disk. You will surely run out of space at a bad time.

The X factor

The X: drive is a critical piece of this backup ballet. We regularly clone our production databases to test and development databases on other servers, or alternate nodes. RMAN likes to get its files from where it backed them up. The easiest way to do this is back up to tape. This way, it makes no difference what server you are on when you perform the clone. When you tell RMAN your target database, it goes straight to the media management layer (MML) for the files it needs. Our problem with tape was unreliability with tapes or tape drives. Our solution was to back up to disk. The problem was going to be copying files back and forth from server to server to the same drive mapping. What we needed to do was map a SAN drive to our production server and then have our test server map to the same SAN drive at boot time. There was no way to have our SAN do this, but we could do it with a command at the command line from the server. We were able to solve this issue with a service from the Windows Resource Kit called AutoExNT. It basically allows us to run a batch file at boot time (Fenner, 2007). We are able to put the command in there and now production and test are mapped to the same X: drive.

The X-files factor

AutoExNT works awesomely for the 32-bit servers. The problem comes when you have a 64-bit server. Unfortunately, there are no Windows Resource Kits available for 64-bit Windows, so no AutoExNT. Luckily, we were able to persuade our development staff to create a “Launcher” service for us that works on 64-bit Windows. It is the same thing as AutoExNT. Whatever we put in the batch file is executed when the server boots.

3. Perform a level 0 (zero) RMAN backup

This step is usually performed once per week. We have some larger, more active databases that create huge amounts of archivelogs. In a recovery or cloning scenario, the time to restore the database will take too long. In these instances, we will take more than one level 0 backup during the week. In the level 0 RMAN backup, we perform no Recovery Catalog maintenance. The script is called DBname_lvl_0_disk.rcv.

Code Listing 3:

#************* Configure controls for Recovery Manager *******************#***************** in case they have been changed ************************configure controlfile autobackup on;configure controlfile autobackup format for device type disk to 'X:\rman\CFILE_%F.bks';configure channel 1 device type disk format 'X:\rman\BACKUPPIECE_%d_%U_%T_%s.bks' maxpiecesize 2048M;configure retention policy to recovery window of 34 days;#************* Perform weekly RMAN level 0 Database Backup ***************backup as compressed backupset incremental level = 0 database;#************* Confirm level 0 Backup is valid ***************************

Page 4: Oracle 10g Backup Guide

restore database check logical validate;

The first command configures RMAN so the control file and spfile will be backed up automatically. The second command instructs RMAN to name the file in a particular format. In this case, when backing up to disk call the file ‘X:\rman\CFILE_F%.bks’. Line three says create a disk channel and call it “1”. Name the backup pieces 'X:\rman\BACKUPPIECE_%d_%U_%T_%s.bks' and any backup piece created should be no larger than 2 GB. The fourth line tells the Recovery Catalog that the backups for the target database should be considered good for 34 days. The fifth line actually performs the backup. It tells RMAN to create a compressed backup set, which really means one or more compressed backup pieces. It also says take a full backup of only the database. No archivelogs will be backed up. Since storage is a major issue, compressing backups has really helped out (Freeman, 2004, p. 90). We have found this to be one of the most useful RMAN features. The last line performs a check on the backup that just finished. It reads the backup files and validates them to make sure they are not corrupt or damaged in some way. I highly advise NOT skipping this step. Nothing could be worse than needing to restore a database and finding out too late that one or more of the files are no good!

Tip: You can use the files while they are being validated. Yes, it’s true! I have begun a clone or restore many times after the backup was completed but before the validation was done.

I call the RMAN level 0 backup script using a Windows batch file. The batch file is called DBname_lvl_0_disk.bat.

Code Listing 4:

set NLS_DATE_FORMAT=DD-MON-YYYY HH24:MI:SSset NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252set CURDIR=D:\oracle\admin\common\backup

cd %CURDIR%

rman target ID/pword@DBname catalog rcatID/rcatpword@rcatname log=logs\DBname_lvl_0_disk.log @DBname_lvl_0_disk.rcv

page_on_backup.vbs DBname_level_0_disk.log page_DBname_level_0.log DBname

The first two lines set operating system environment variables. We prefer the more detailed date mask of "05-DEC-2007 23:59:59" rather than "05-DEC-2007". The date format becomes more important during times of recovery. Setting the NLS_LANG variable removes any doubt about which character set the database is using (Bersinic & Watson, 2005, ch. 23:p. 8). The third and fourth lines are important for using scheduled tasks. Windows needs to be directed to where the RMAN script is, so set the directory and then move there. Next, RMAN is actually called. The target and catalog are both logged into. A log file is created in a separate directory inside the current directory called “logs” and the script in code listing 4 is called. If there are any errors, a Visual Basic script is called that pages support personnel. If there are no errors then an email of the log file is sent. There will be more details on paging in section 9.

Page 5: Oracle 10g Backup Guide

4. Create clone files

This is a pivotal step to automating the “refresh” for test databases using RMAN backups. The clone files batch jobs create the actual “duplicate database” statements we use to clone our production databases to our test areas. This is a SQL script called create_TESTDB_clone_files.sql that runs on the production server.

Code Listing 5:

set echo off;set feedback off;set linesize 140;SET PAGESIZE 0;set trimspool on;ALTER SESSION SET NLS_DATE_FORMAT='YYYY-MON-DD HH24:MI:SS';select checkpoint_change# from v$database;alter system archive log current;select sysdate from dual;-- **************************************************************************-- **************************************************************************-- **************************************************************************-- ******************************* TESTDBSERVER *****************************-- ******************************* TESTDBSERVER *****************************-- ******************************* TESTDBSERVER *****************************-- **************************************************************************-- **************************************************************************-- **************************************************************************--TESTDB1spool \\TESTDBSERVER\d$\oracle\admin\common\clone\clone_to_TETSTDB1.rcvselect 'duplicate target database to TESTDB1 until time ' ||''''|| sysdate ||''';' from dual;spool off;--TESTDB2spool \\TESTDBSERVER\d$\oracle\admin\common\clone\clone_to_TETSTDB2.rcvselect 'duplicate target database to TESTDB2 until time ' ||''''|| sysdate ||''';' from dual;spool off;--TESTDB3spool \\TESTDBSERVER\d$\oracle\admin\common\clone\clone_to_TETSTDB3.rcvselect 'duplicate target database to TESTDB23until time ' ||''''|| sysdate ||''';' from dual;spool off;-- **************************************************************************-- **************************************************************************-- **************************************************************************-- ********************************* THE END ********************************

Page 6: Oracle 10g Backup Guide

-- ********************************* THE END ********************************-- ********************************* THE END ********************************-- **************************************************************************-- **************************************************************************-- **************************************************************************alter system archive log current;select sysdate from dual;select checkpoint_change# from v$database;select sysdate from dual;exit;

This script runs on the production server and spools the output to the test server. The first thing you notice is the NLS_DATE_FORMAT setting. This is being set the same way it was set in the level 0 backup. Next, you see that we have selected the system change number or SCN. Databases can also be cloned and/or recovered by using the SCN (Greenwald, Stackowiak & Stern, 2004, p. 151). We used to duplicate using the SCN but no longer do. We didn’t remove this step because we like to see the SCN in the log file. In case of a production recovery scenario, the SCN is available in one additional location. In the next statement, we archive the current redo log. We have been performing RMAN duplications since Oracle 8i and always had issues with the logs. This was the only sure fire way we could make it work every time. Next, we select the sysdate, we like to see it under the SCN. Dropping down to the first spool statement, you see that an RMAN script is being written to TESTDBSERVER called clone_to_TESTDB1.rcv. There will be only one line in the script and when written, will look like this:

Code Listing 6:

duplicate target database to TESTDB1 until time '2007-DEC-05 19:55:00';

You can write a separate clone script for each test database on your test database server. As you can see, this is what we have done. We have some production database servers with more than one production database. For those we just have two of these scripts we run, one against each production database creating a cloning script for each test database.

Code Listing 7:

set NLS_DATE_FORMAT=DD-MON-YYYY HH24:MI:SSset NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252set CURDIR=D:\oracle\admin\common\batch

cd %CURDIR%

sqlplus system/pword@DBNAME @create_TESTDB_clone_files.sql > logs\create_TESTDB_clone_files.log

The batch file for this script is simple. The first two lines set your environment. The third and fourth lines are important for using scheduled tasks. Windows needs to be

Page 7: Oracle 10g Backup Guide

directed to where the SQL script is, so set the directory and then move there. Next, SQLPlus is actually called and runs the script to create the clone files. A log file is created in a separate directory inside the current directory called “logs”.

Next

5. Create archivelog backup which includes Recovery Catalog housekeeping

The archivelog backup is taken every day. We have already mentioned we normally take a level 0 backup once per week. Since this is a daily occurrence, we perform our RMAN Recovery Catalog maintenance in this step. The script is called DBNAME_arc_only_disk.rcv.

Code Listing 8:

#************* Configure controls for Recovery Manager ************#************* in case they have been changed *********************configure controlfile autobackup on;configure controlfile autobackup format for device type disk

to 'X:\rman\CFILE_%F.bks';configure retention policy to recovery window of 34 days;

#************* Perform nightly RMAN Archivelog Backup *************backup archivelog all format 'X:\rman\ARC_%d_%U_%T_%s.bks';

#************* Maintenance Step to remove old Archivelogs *********delete noprompt archivelog until time 'SYSDATE - 3';

#************* Maintenance Steps to clean Recovery Catalog ********report obsolete device type disk;crosscheck backup completed before 'sysdate-34';delete noprompt obsolete recovery window of 34 days device type disk;delete noprompt expired backup device type disk;#************* Show all controls configured for this **************#************* Database in RMAN ***********************************show all;

#************* List all RMAN disk backups *************************list backup of database device type disk;

The first two commands configure RMAN for the control file and spfile auto backup. The first command turns it on so that every time a backup is run for a target this has been set for, the control file and spfile are backed up. The second command instructs RMAN on how to name the file on a particular format. In this case, when backing up to disk call the file ‘X:\rman\CFILE_F%.bks’. The third line tells the Recovery Catalog that the backups for the target database are good for 34 days. The fourth line actually performs the backup. It tells RMAN to back up all the archivelogs on disk in the specified format. The next step removes all archivelogs older than three days. We like to keep three days of archivelogs on disk. Now we start in the Recovery Catalog maintenance. Catalog maintenance is very important. If these files were deleted and the maintenance steps not performed, then the Recovery Catalog would contain information about backups that were no longer significant (Alapati, 2005, p. 661). The next step reports obsolete backups made to

Page 8: Oracle 10g Backup Guide

disk that meet our retention policy of 34 days. Any backups older are considered obsolete and can be deleted. The crosscheck command will notify you whether or not any of your files are missing. If they are, they will be marked as expired. The next two delete commands remove the obsolete and expired information from the catalog. Remember, we delete all of our RMAN backup files from disk every 7 days. Our retention policy is set to 34 days because that is what our on-site tape retention policy is. If needed, we could restore RMAN files on a server up to 34 days old. Could we recover the files to a database? Yes, we could! How? Because we have a 34 day retention policy and all our RMAN information about those backups are still in the Recovery Catalog! Next, the show all command gives the configured parameters we have in RMAN (Hart & Freeman, 2007, p. 89). The list command shows all the disk backups that are still relevant in the Recovery Catalog.

Tip: Obviously, the Recovery Catalog is very important and needs to be backed up (Looney, 2004, p. 918). It is also the easiest to recover. If you lose your Recovery Catalog and you’re in a pinch, you can import the Recovery Catalog schema into any database and voila! You’ve got a new Recovery Catalog (Exporting and Importing, 2005).

The batch file for this script is simple. The batch file is called DBNAME_arc_only_disk.bat.

Code Listing 9:

set NLS_DATE_FORMAT=DD-MON-YYYY HH24:MI:SSset NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252set CURDIR=D:\oracle\admin\common\backup

cd %CURDIR%

rman target ID/pword@DBname catalog rcatID/rcatpword@rcatname log=logs\DBNAME_arc_only_disk.log @DBNAME_arc_only_disk.rcv

page_on_backup.vbs DBNAME_arc_only_disk.log page_DBNAME_arc_only.log DBNAME

The first two lines set variables for your environment. The third and fourth lines are important for using scheduled tasks. Windows needs to be directed to where the RMAN script is, so set the directory and then move there. Next RMAN is called. The target and the catalog are logged into. A log file is created in a separate directory inside the current directory called “logs” and the script in code listing 9 is called. If there are any errors, a Visual Basic script is called that pages support personnel. If there are no errors then an email of the log file is sent. There will be more details on paging in section 9.

6. Delete data pump export files

Data Pump needs new file names for the dump files each time it runs. Unlike export, it will not overwrite old dump files. So prior to any nightly data pump scheduled task, the old data pump files need to be removed. This batch file does just that.

Code Listing 10:

Page 9: Oracle 10g Backup Guide

set CURDIR=D:\oracle\admin\common\batchcd %CURDIR%set logfile=logs\delete_Xdrive_expdp_files.logecho 'logfile = ' %logfile% > %logfile%echo . >> %logfile%echo . >> %logfile%echo '*********************************************************' >> %logfile%echo '* The following files will be deleted. *' >> %logfile%echo '*********************************************************' >> %logfile%echo . >> %logfile%echo . >> %logfile%FORFILES /p X:\data_pump\DMPs /m *.dmp /d -0 /c "CMD /C echo @PATH\@FILE @FDATE.@FTIME" >> %logfile%echo . >> %logfile%echo . >> %logfile%echo '*********************************************************' >> %logfile%echo '* Starting deletes now ... *' >> %logfile%echo '*********************************************************' >> %logfile%echo . >> %logfile%echo . >> %logfile%FORFILES /p X:\data_pump\DMPs /d -0 /c "CMD /C del *.dmp" >> %logfile%echo . >> %logfile%echo . >> %logfile%

As we’ve seen in the other batch scripts here, initially we set the directory then move there. We also set the log file as a variable since we will be using it frequently. In fact, the first entry into the log file is the log file name. The echoes with dots are just for better readability in the log. There are really only two significant commands in this script and both of them are FORFILES. The first one simply lists the files that will be deleted. The second one actually performs the deletion of the files.

7. Perform data pump export

As an additional safety measure in our portfolio, we also take nightly data pump exports. As an additional advantage, when there are times when we need a table or two restored it is far easier to get them here than from RMAN.

We use par files to hold our data pump commands, just like regular exports (Kumar, Kanagaraj & Stroupe, 2005). We have some variables set in the database. You will see them in the par file and these are the SQL commands used to create them:

Code Listing 11:

create directory x_dp_dumps as 'X:\data_pump\DMPs';create directory x_dp_logs as 'X:\data_pump\logs';

These signal data pump where to send the dump files and log files. Here are the par file contents:

Page 10: Oracle 10g Backup Guide

Code Listing 12:

content = alldumpfile = x_dp_dumps:DBNAME_FULL_%U.dmpestimate = statisticsfull = yjob_name = DBNAME_FULLlogfile = x_dp_logs:DBNAME_FULL.logfilesize = 2G

Content equals all means we want to export everything, or no exclusions. The dump file parameter asks for a file location and name. The location is given as a variable. The file name uses a substitution variable, %U. The %U will be replaced by a two number integer starting with 01. One file could be created or many, depending on the database size. Estimate gives you a good idea about what size your dump file will be. Block is the default but we use statistics since ours are current. Full specifies whether or not you want to export a full database mode export. Job_name is a preference, in case you like to name your own. Log file is set up similarly to dump file. Log file asks for a file location and name. The location is given as a variable and the name is also given. File size we use as a preference. We like to keep our file sizes to 2 GB or less. When copying or compressing, it is far easier and faster to move or compress 10 files at the same time than 1 big file.

We call data pump as a scheduled task, but we set it up a little differently. We have an individual par file for each database and one common batch file to execute them. Here is the command used in Scheduled Tasks:

Code Listing 13:

D:\oracle\admin\common\expdp\expdp_DATABASE.bat DBNAME

Here is the actual batch file used to call the data pump par files.

Code Listing 14:

set DATABASE=%1set ORACLE_HOME=D:\oracle\product\10.2.0\db_1

%ORACLE_HOME%\bin\expdp ID/pword@%DATABASE% parfile=D:\oracle\admin\common\expdp\expdp_%DATABASE%.par

The only thing being passed to the batch file is the database name. It becomes %DATABASE%. Performing data pump exports in this manner has worked out pretty well for us.

8. Check logs for errors

Every night after all the batch jobs on a server have completed, we run an error check on that server. It is a simple batch file that performs a qgrep on key words in key logs and formats the information in an easily readable fashion. As previously stated, you can easily tell the “old” logs from the current “log”s by how the files are named. Here is the batch file called error_check.bat:

Page 11: Oracle 10g Backup Guide

Code Listing 15:

error_check.batset SERVER=DBSERVERset LOGFILE=error_check.logecho ************************************************************ > %LOGFILE%echo *********************************************************** >> %LOGFILE%echo *********************************************************** >> %LOGFILE%echo %SERVER% >> %LOGFILE%echo Daily Error Report >> %LOGFILE%date /T >> %LOGFILE%echo *********************************************************** >> %LOGFILE%echo *********************************************************** >> %LOGFILE%echo *********************************************************** >> %LOGFILE%echo . >> %LOGFILE%echo . >> %LOGFILE%echo *********************************************************** >> %LOGFILE%echo The following files have been found with errors. >> %LOGFILE%echo *********************************************************** >> %LOGFILE%echo . >> %LOGFILE%echo . >> %LOGFILE%echo Backup files >> %LOGFILE%echo . >> %LOGFILE%qgrep -l RMAN- D:\oracle\admin\common\backup\logs\* >> %LOGFILE%echo . >> %LOGFILE%echo . >> %LOGFILE%echo Batch files >> %LOGFILE%echo . >> %LOGFILE%qgrep -l error D:\oracle\admin\common\batch\logs\* >> %LOGFILE%echo . >> %LOGFILE%echo . >> %LOGFILE%echo Clone files >> %LOGFILE%echo . >> %LOGFILE%qgrep -l RMAN- D:\oracle\admin\common\clone\logs\* >> %LOGFILE%echo . >> %LOGFILE%echo . >> %LOGFILE%echo Alert Logs >> %LOGFILE%echo . >> %LOGFILE%@remqgrep -l ORA- D:\oracle\product\10.2.0\admin\DBNAME1\udump\* >> %LOGFILE%qgrep -l ORA- D:\oracle\product\10.2.0\admin\DBNAME2\udump\* >> %LOGFILE%@remecho . >> %LOGFILE%echo . >> %LOGFILE%

The backup file check is for RMAN errors. The batch file check is for errors with file deletions and creations. The clone file error check is for failed database duplications. The alert log check is a little misleading. This actually checks the udump directories for files with errors. Shortly after the log is created, we send it to ourselves using a free email client called Bmail from Beyond Logic. This is what our email batch file email_errors.bat, looks like:

Page 12: Oracle 10g Backup Guide

Code Listing 16:

bmail -s 10.10.10.10 -t [email protected] -f Oracle@ thecountyoverhere.gov -h -a "DBSERVER Daily Error Report" -m error_check.logbmail -s 10.10.10.10 -t [email protected] -f Oracle@ thecountyoverhere.gov -h -a "DBSERVER Daily Error Report" -m error_check.log

9. Page and/or email short error description

Some jobs need immediate notification upon failure. For these, we use a Visual Basic script that sorts through whether or not we have an error and immediately sends us a page. This script runs at every execution and sends an email with the log output. This is something we want on these jobs regardless of whether the job completes successfully or not. But if it fails, we want an email of the log and a page indicating the failure. The script is called with three arguments, like this:

Code Listing 17:

page_on_backup.vbs DBNAME_arc_only_disk.log page_DBNAME_arc_only.log DBNAME

The arguments are log name, script log name, and database name. Here is a copy of page_on_backup.vbs. This is the script that runs in our RMAN level 0 backups and archive log backups.

Code Listing 18:

'This script emails the log file for a backup and searches it for the phrase "ORA-". If found, pages the recipients'Additional pager Numbers' whodat - [email protected]' whodis - [email protected]

Dim ArgObj, var1, var2Set ArgObj = WScript.Argumentsvar1 = ArgObj(0)var2 = ArgObj(1)var3 = ArgObj(2)

'email log files

Dim WshSHell1 : set WshShell1 = CreateObject("WScript.Shell")WshShell1.Run("D:\oracle\admin\common\error\bmail -s 10.10.10.10 -t [email protected] -f [email protected] -h -a " & var1 & " attached -m d:\oracle\admin\common\backup\logs\" & var1 &"")WshShell1.Run("D:\oracle\admin\common\error\bmail -s 10.10.10.10 -t [email protected] -f [email protected] -h -a " & var1 & " attached -m d:\oracle\admin\common\backup\logs\" & var1 &"")

'msgbox "var1 = " & var1 & " var2 = " & var2 & ""

Const ForReading = 1, ForWriting = 2

Page 13: Oracle 10g Backup Guide

Set WshNetwork = WScript.CreateObject("WScript.Network")Dim lgmain : Set lgmain = CreateObject("Scripting.FileSystemObject")Dim lgmain2 : Set lgmain2 = lgmain.OpenTextFile("D:\Oracle\Admin\common\backup\logs\" & var2 &"", ForWriting, True)

lgmain2.WriteLine "Processing began: " & Nowlgmain2.WriteLine ""lgmain2.WriteLine ""

Set objRegEx = CreateObject("VBScript.RegExp")objRegEx.Global = TrueobjRegEx.Pattern = "ORA-"

Set objFSO = CreateObject("Scripting.FileSystemObject")Set objFile = objFSO.OpenTextFile("D:\oracle\admin\common\backup\logs\" & var1 & "", ForReading)strSearchString = objFile.ReadAllobjFile.Close

Set colMatches = objRegEx.Execute(strSearchString)If colMatches.Count > 0 Then Dim WshSHell2 : set WshShell2 = CreateObject("WScript.Shell") WshShell2.Run("D:\oracle\admin\common\error\bmail -s 10.10.10.10 -t [email protected] -f " & var3 & "@thecountyoverhere.gov -h -a " & var3 & "_BACKUP_ERRORS_FOUND") WshShell2.Run("D:\oracle\admin\common\error\bmail -s 10.10.10.10 -t [email protected] -f " & var3 & "@thecountyoverhere.gov -h -a " & var3 & "_BACKUP_ERRORS_FOUND") WshShell2.Run("D:\oracle\admin\common\error\bmail -s 10.10.10.10 -t [email protected] -f " & var3 & "@thecountyoverhere.gov -h -a " & var3 & "_BACKUP_ERRORS_FOUND") lgmain2.WriteLine "page completed"End IfIf colMatches.Count = 0 Then lgmain2.WriteLine "no problems found, no page required"End If

The first thing that happens is that an email of the log is sent in an email. Next, the error codes are searched for in the log. If an error is found, a page is also sent. If not, the script completes without paging.

Conclusion

A successful backup plan is a major portion of any database administrator’s overall database strategy. Backups must be carefully planned and checked often. Automation is a good thing and can be very useful. It must also be thoroughly defined and rigorously tested. It can be done if you think about your environment logically. Ask questions such as, “What must be done first? What must be done next?” and so on. When you reach a roadblock, think about other ways you can perform the same task. This becomes easier if you try to think out your environment and what you would like to accomplish ahead of time, this could keep you from having to backtrack. Sometimes, changing the order of tasks may accomplish your goal as well.

Page 14: Oracle 10g Backup Guide

Recovering from Loss of All Control Files

In a prior article about backup and recovery exercises, one of the scenarios dealt with recovering from the loss of a control file. In that scenario, the database was running with more than one control file, so recovering from the (self-induced, but for instructional purposes) media failure was pretty simple. When installing Oracle and creating a seed database, few DBAs have missed the incongruence between Oracle’s advice to multiplex – specifically in the sense of using more than one disk – control files and the installer’s creation of the control files in the same directory.

It’s easy to understand why the files are created in the same directory. It’s a trade-off between installing the software in a relatively simple manner versus requiring users, most of which are probably quite new at this, to have more than one disk. In other words, how many of you bang on Oracle with your work or home PC, and of those computers, how many have more than one disk/drive? The installer gives DBAs a good running start on getting a database created, and having it up and running in short order. The installation is not perfect, but on the other hand, it’s not that bad either.

Given that virtually all OUI/standard template creation of databases will have the control files in the same location, it is not surprising to see questions on the various Oracle question and answer forums in the nature of, “I’ve lost all of my control files, and I don’t have a backup. Urgently waiting your reply.” Despite the best advice, it happens. In fact, it happens in production environments per what the forum posters claim.

For those of you who find yourself in this situation, the good news is that yes, you can recover the database. Even better, the recovery steps are not that hard to perform. The bad news is that you may lose some data. “Data,” in this sense, is data within the prior version of the control files. If you were using the control file as the RMAN repository, the current set of backup information will be lost. However, you can manually add what appears to be orphaned backup sets/pieces to the repository (and should this happen to you, it will also serve as a good example of why the repository should be stored in a recovery catalog as opposed to solely within the target’s control files). Let’s start the example by removing/deleting the control files.

Assuming you are doing this on Windows, the files will be locked. This exercise requires a shutdown of the database. Take a cold backup so you can recover (technically, restore) in case something in your environment goes awry. I’ve performed this several times on my at work “bang around” database. Really, this works.

Shown below are the current files of my DB10 database, plus a Backup folder with copies of all files. The database is shut down.

Page 15: Oracle 10g Backup Guide

The control files will be renamed and then a startup will be attempted.

--Clean shutdown to release the locks on the files

SQL> conn sys/oracle as sysdbaConnected.SQL> shutdownDatabase closed.Database dismounted.ORACLE instance shut down.

--Startup issued with the control files missing

SQL> startupORACLE instance started.

Total System Global Area 289406976 bytesFixed Size 1290184 bytesVariable Size 130023480 bytesDatabase Buffers 150994944 bytesRedo Buffers 7098368 bytesORA-00205: error in identifying control file, check alert log for more info

Now we can treat this situation as having to create control files. What is the current state of the instance/database? If the control files are missing, then the database cannot be mounted, so at most, the database state is nomount. Any attempt to mount will fail because of the missing files.

SQL> select status from v$instance;

STATUS

Page 16: Oracle 10g Backup Guide

------------STARTED

SQL> alter database mount;alter database mount*ERROR at line 1:ORA-00205: error in identifying control file, check alert log for more info

No problem, we’ll use the command to create a control file from trace.

SQL> ALTER DATABASE BACKUP CONTROLFILE TO TRACE;ALTER DATABASE BACKUP CONTROLFILE TO TRACE*ERROR at line 1:ORA-01507: database not mounted

As you can see, not being mounted is a show stopper for this approach. What is required is to manually create a statement which when run, creates a control file. We need the names of all of the datafiles in the database for this step, so it is helpful to have those listed ahead of time (which I did in this case, see the first diagram). Using what Windows is reporting as the file sizes (in KB), we can construct a CREATE CONTROLFILE statement as shown:

CREATE CONTROLFILE REUSE DATABASE "DB10" NORESETLOGS NOARCHIVELOG MAXLOGFILES 50 MAXLOGMEMBERS 3 MAXDATAFILES 300 MAXINSTANCES 8 MAXLOGHISTORY 500 LOGFILE GROUP 7 'D:\oracle\product\10.2.0\oradata\db10\redo07.log' SIZE 5121K, GROUP 8 'D:\oracle\product\10.2.0\oradata\db10\redo08.log' SIZE 5121K, GROUP 9 'D:\oracle\product\10.2.0\oradata\db10\redo09.log' SIZE 5121K DATAFILE 'D:\oracle\product\10.2.0\oradata\db10\users01.dbf' SIZE 79368K, 'D:\oracle\product\10.2.0\oradata\db10\undotbs01.dbf' SIZE 128008K, 'D:\oracle\product\10.2.0\oradata\db10\system01.dbf' SIZE 614408K, 'D:\oracle\product\10.2.0\oradata\db10\sysaux01.dbf' SIZE 593928K, 'D:\oracle\product\10.2.0\oradata\db10\example01.dbf' SIZE 174088K;

Here is where the old match-the-filename-to-the-tablespace-name naming convention comes in handy. We don’t have to match the datafile names to the tablespace names, but we do have to match the redo log filename to the redo log group. That can be tricky because with N groups, there will be N! ways of making those assignments. In this example, we are fortunate because the log member has a name indicative of the log group to which it belongs. Let’s issue the statement and see what happens.

SQL> CREATE CONTROLFILE REUSE DATABASE "DB10" NORESETLOGS NOARCHIVELOG 2 MAXLOGFILES 50 3 MAXLOGMEMBERS 3 4 MAXDATAFILES 300 5 MAXINSTANCES 8 6 MAXLOGHISTORY 500

Page 17: Oracle 10g Backup Guide

7 LOGFILE 8 GROUP 7 'D:\oracle\product\10.2.0\oradata\db10\redo07.log' SIZE 5121K, 9 GROUP 8 'D:\oracle\product\10.2.0\oradata\db10\redo08.log' SIZE 5121K, 10 GROUP 9 'D:\oracle\product\10.2.0\oradata\db10\redo09.log' SIZE 5121K 11 DATAFILE 12 'D:\oracle\product\10.2.0\oradata\db10\users01.dbf' SIZE 79368K, 13 'D:\oracle\product\10.2.0\oradata\db10\undotbs01.dbf' SIZE 128008K, 14 'D:\oracle\product\10.2.0\oradata\db10\system01.dbf' SIZE 614408K, 15 'D:\oracle\product\10.2.0\oradata\db10\sysaux01.dbf' SIZE 593928K, 16 'D:\oracle\product\10.2.0\oradata\db10\example01.dbf' SIZE 174088K; CREATE CONTROLFILE REUSE DATABASE "DB10" NORESETLOGS NOARCHIVELOG*ERROR at line 1:ORA-01503: CREATE CONTROLFILE failedORA-01163: SIZE clause indicates 9921 (blocks), but should match header 9920ORA-01110: data file 4: 'D:\oracle\product\10.2.0\oradata\db10\users01.dbf'

This error looks pretty serious – we have to start figuring out the number of blocks for at least this file and all others? Yes, but it is easy. The size can be figured as follows:

Expected size = Expected # of blocks * db_block_size / 1024

Extract the db_block_size from “show parameter db_block” and see (in my database) 8192. The USERS datafile size in the CREATE CONTROLFILE statement should then be 9920 * 8192 / 1024 = 79360K. Replace the OS reported value of 79368 with 79360 in the CREATE CONTROLFILE statement and re-issue it:

CREATE CONTROLFILE REUSE DATABASE "DB10" NORESETLOGS NOARCHIVELOG*ERROR at line 1:ORA-01503: CREATE CONTROLFILE failedORA-01163: SIZE clause indicates 16001 (blocks), but should match header 16000ORA-01110: data file 2: 'D:\oracle\product\10.2.0\oradata\db10\undotbs01.dbf'

The datafile for the UNDO tablespace has the same problem USERS did, but we can see a trend: the number of blocks is off by one, so let’s try downsizing the size the same way (subtract 8K) for this and the remaining datafiles.

SQL> CREATE CONTROLFILE REUSE DATABASE "DB10" NORESETLOGS NOARCHIVELOG 2 MAXLOGFILES 50 3 MAXLOGMEMBERS 3 4 MAXDATAFILES 300 5 MAXINSTANCES 8 6 MAXLOGHISTORY 500 7 LOGFILE

Page 18: Oracle 10g Backup Guide

8 GROUP 7 'D:\oracle\product\10.2.0\oradata\db10\redo07.log' SIZE 5121K, 9 GROUP 8 'D:\oracle\product\10.2.0\oradata\db10\redo08.log' SIZE 5121K, 10 GROUP 9 'D:\oracle\product\10.2.0\oradata\db10\redo09.log' SIZE 5121K 11 DATAFILE 12 'D:\oracle\product\10.2.0\oradata\db10\users01.dbf' SIZE 79360K, 13 'D:\oracle\product\10.2.0\oradata\db10\undotbs01.dbf' SIZE 128000K, 14 'D:\oracle\product\10.2.0\oradata\db10\system01.dbf' SIZE 614400K, 15 'D:\oracle\product\10.2.0\oradata\db10\sysaux01.dbf' SIZE 593920K, 16 'D:\oracle\product\10.2.0\oradata\db10\example01.dbf' SIZE 174080K;CREATE CONTROLFILE REUSE DATABASE "DB10" NORESETLOGS NOARCHIVELOG*ERROR at line 1:ORA-01503: CREATE CONTROLFILE failedORA-01163: SIZE clause indicates 10242 (blocks), but should match header 10240ORA-01517: log member: 'D:\oracle\product\10.2.0\oradata\db10\redo07.log'

It appears the datafiles part worked, and now the redo log sizes need some adjustment. Using a value of 5120K instead of the OS reported value of 5121K yields:

SQL> CREATE CONTROLFILE REUSE DATABASE "DB10" NORESETLOGS NOARCHIVELOG 2 MAXLOGFILES 50 3 MAXLOGMEMBERS 3 4 MAXDATAFILES 300 5 MAXINSTANCES 8 6 MAXLOGHISTORY 500 7 LOGFILE 8 GROUP 7 'D:\oracle\product\10.2.0\oradata\db10\redo07.log' SIZE 5120K, 9 GROUP 8 'D:\oracle\product\10.2.0\oradata\db10\redo08.log' SIZE 5120K, 10 GROUP 9 'D:\oracle\product\10.2.0\oradata\db10\redo09.log' SIZE 5120K 11 DATAFILE 12 'D:\oracle\product\10.2.0\oradata\db10\users01.dbf' SIZE 79360K, 13 'D:\oracle\product\10.2.0\oradata\db10\undotbs01.dbf' SIZE 128000K, 14 'D:\oracle\product\10.2.0\oradata\db10\system01.dbf' SIZE 614400K, 15 'D:\oracle\product\10.2.0\oradata\db10\sysaux01.dbf' SIZE 593920K, 16 'D:\oracle\product\10.2.0\oradata\db10\example01.dbf' SIZE 174080K;

Control file created.

Why did the redo log size get reduced by 1K instead of 8K as with the datafiles? Or, for what are approximately 5MB in size files, why are there so many more blocks

Page 19: Oracle 10g Backup Guide

(over 10,000) when compared to what the approximately 78MB USERS datafile has (9,900)? The answer lies within the size of the redo log blocks. On Windows, these blocks can be 512 bytes. For the redo logs then:

Expected size = Expected # of blocks * 512 / 1024

The 1K reduction comes from the difference of two blocks at 512 bytes each.

Now that the control file(s) has been created, what is the state of the database? Verify that your files have been created, and then select the status from V$INSTANCE and see MOUNTED. If the database is mounted, can it be opened? The answer is yes.

SQL> select status from v$instance;

STATUS------------MOUNTED

SQL> alter database open;

Database altered.

Connect as a user and perform any operation which requires the temporary tablespace (or group, in 10g and above). What do you see? Up to a point, the database may appear to fine, but what does the following query and result tell you?

SQL> select * from dba_temp_files;

no rows selected

This presents an interesting situation: the temp tablespace is online and at the same time, there are no datafiles associated with it. In fact, you can perform a shutdown and startup and see that the temporary tablespace temp file is untouched. If the query or statement you used to test the database required some temp space for sorting, you will see the ORA-25153 error.

[/home/oracle]$ oerr ora 2515325153, 00000, "Temporary Tablespace is Empty"// *Cause: An attempt was made to use space in a temporary tablespace with// no files.// *Action: Add files to the tablespace using ADD TEMPFILE command.

Very easy fix, and once that is complete, your database is fully functional.

In Closing

It is interesting how the loss of a special file is easily restored or fixed when compared to what it takes to restore a datafile. The example presented here should give you some confidence that losing all of your control files isn’t the end of the world. Nonetheless, keep in mind that if your RMAN repository was in the control files, you will need to re-create it.

Page 20: Oracle 10g Backup Guide

Hands-on Oracle: Backup and Recovery Games - Creating Datafiles

We know that in Oracle, certain things are and can be done at certain times. One of those operations pertains to adding or creating datafiles. One operation where adding datafiles to a database is common is within or during the CREATE DATABASE-related statements. Even if using Oracle managed files and accepting defaults, datafiles will be created. Creating tablespaces also includes a provision for adding datafiles. Yet another add/create datafile event takes place when altering a tablespace for growth-related purposes (altering a tablespace by adding a datafile). In all of these scenarios, there is one thing in common: when the datafile is added, it is associated with a tablespace you have named or identified in the DDL statement. To categorize the commands or operations being used, we can identify them as CREATE or ALTER operations.

Is it possible to add (by creating) a datafile to the database without specifying the tablespace to which it belongs? That is, can you issue an ALTER DATABASE CREATE DATAFILE ‘<path/name>’; and expect Oracle to know what to do with this file? If so, does this work for any datafile? And, why would you do this in the first place?

After you alter the structure of a database (e.g., add a new datafile for whatever reason, add a new tablespace, or add another file to an existing tablespace), what is a best practice to follow? That’s right, take a backup. The Administrator’s Guide (10gR2) states the following no less than five times: After making any structural changes to a database, always perform an immediate and complete backup.

Let’s suppose that sometime after having added a datafile (or before having a backup) and data manipulation operations have been applied to objects whose tablespace owns the datafile of interest, you lose the datafile. Media recovery is now required. Can the data be recovered? Don’t you need a backup of the datafile to which archived redo logs are applied in order to perform media recovery? Almost every example of media recovery seems to include that part – restore a backed up copy of the datafile and then apply archived redo logs to bring the tablespace to a more current point in time.

This scenario is different – there is no backed up copy of the datafile to start with, so how can recovery be used here? The file existed once and now it does not. The control file still knows about the file, which is why you may or may not be able to open the database, or keep the database in an open state. This is where the ALTER DATABASE CREATE DATAFILE statement comes into play. You do not explicitly state the tablespace to which the datafile belongs because Oracle already knows this bit of information. Your task (one of two) in this scenario is to create a replacement file (same name or rename it, to include using a different path). Do you have to specify the size of the file? No, again, Oracle already knows this. Your other mission (two of two) is to apply archived redo logs against this filler/placeholder file.

To reiterate what must be done: create a new datafile and apply archived redo logs (using RECOVER DATAFILE). Does this work for any datafile? It does not,

Page 21: Oracle 10g Backup Guide

specifically; you cannot use this technique to recover SYSTEM tablespace datafiles. Does this work for any DML-without-a-backed-up-copy scenario? No, it does not. If the DML was not logged, then there is nothing to recover from the archived redo logs. When is, or when can it be, DML not logged? That’s a different topic (covered here), so for the point of this scenario, we assume that normal logging has taken place.

Seeing is believing, so let’s prove that recovery can take place with an example. Create a test database (one you can afford to trash) and make the files as small as practical (we don’t care about the size; it’s just the fact that they exist). The database will need to be in archivelog mode. Once the database is open for business, create a new tablespace or add a datafile to an existing tablespace AND have the file location in a place where you can replicate media failure. A flash drive is handy for this; just pull the drive when ready to simulate loss of the datafile. Another way is to shutdown the database and rename the target/now missing file. After adding the datafile, create a table, add some data to it, ensure the redo logs rotate through at least once, and then pull the drive.

In the recovery scenario, pretending that the flash drive location is no longer available, the CREATE DATAFILE statement will use syntax like so:

ALTER DATABASE CREATE DATAFILE ‘the old path/name’ AS ‘use a new path/name’;

The media recovery step is then applied against the new datafile via:

RECOVER DATAFILE ‘the new path/name’;

Assuming this all goes well, what should you do when recovery is complete? That’s right – take a backup. If you can recover (so easily?) this way, then why is a backup after adding a new file such a big deal? Going back to the NOLOGGING option, what if a table had been created via NOLOGGING? Or an index? Or lots and lots of each? At least with the backed up datafile, you will have captured the structure of those objects. Without the file, you would have to re-create them, which, in a recovery scenario, could add extra time you don’t really want to be spending given the possible/potential high degree of visibility or scrutiny you may be experiencing (ignoring the obvious question about why there wasn’t a backed up copy of the file in the first place).

The figures and pictures below show the scenario of losing a datafile (I did a shutdown and deleted the file after the service was stopped).

Starting FilesFILE_NAME MB ----------------------------------------------------- -----C:\ORACLE\PRODUCT\10.2.0\ORADATA\DEMO\USERS01.DBF 5C:\ORACLE\PRODUCT\10.2.0\ORADATA\DEMO\SYSAUX01.DBF 240C:\ORACLE\PRODUCT\10.2.0\ORADATA\DEMO\UNDOTBS01.DBF 25C:\ORACLE\PRODUCT\10.2.0\ORADATA\DEMO\SYSTEM01.DBF 470

Add a datafileSQL> alter tablespace users

Page 22: Oracle 10g Backup Guide

2 add datafile 'C:\ORACLE\PRODUCT\10.2.0\ORADATA\DEMO\USERS02.DBF'

3 size 5M;

Tablespace altered.

User DML SQL> alter tablespace users

SQL> conn scott/[email protected]> create table lost 2 as select * from all_objects;

Table created.

SQL> delete from lost;

40768 rows deleted.

SQL> commit;

Commit complete.

SQL> insert into lost select * from all_objects;

40768 rows created.

SQL> commit;

Commit complete.Switch logfiles

SQL> conn system/[email protected]> alter system switch logfile;

System altered.

Page 23: Oracle 10g Backup Guide

SQL> /

System altered.

SQL> /

System altered.

Shutdown, and startup. Check the alert log.

Alert Log - file is missing

Errors in file c:\oracle\product\10.2.0\admin\demo\bdump\demo_dbw0_3512.trc:

ORA-01157: cannot identify/lock data file 5 - see DBWR trace fileORA-01110: data file 5: 'C:\ORACLE\PRODUCT\10.2.0\ORADATA\DEMO\

USERS02.DBF'ORA-27041: unable to open fileOSD-04002: unable to open fileO/S-Error: (OS 2) The system cannot find the file specified.

Perform Recovery

SQL> conn / as sysdbaConnected.SQL> select status from v$instance;

STATUS------------MOUNTED

SQL> alter database create datafile 'C:\ORACLE\PRODUCT\10.2.0\ORADATA\DEMO\USERS02.DBF';

Database altered.

SQL> recover datafile 'C:\ORACLE\PRODUCT\10.2.0\ORADATA\DEMO\USERS02.DBF';

ORA-00279: change 548985 generated at 07/01/2008 21:52:32 needed for thread 1

Page 24: Oracle 10g Backup Guide

ORA-00289: suggestion :C:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\DEMO\ARCHIVELOG\

2008_07_01\O1_MF_1_2_%U_.ARCORA-00280: change 548985 for thread 1 is in sequence #2

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

Log applied.Media recovery complete.SQL> alter database open;

Database altered.

Back to normal

SQL> conn scott/[email protected]> select count(*) from lost;

COUNT(*)---------- 40768

In Closing

This is actually pretty easy to practice and demonstrate, and it offers a little twist on the usual “how and when” you add datafiles to the database operation. As an alternative demonstration, create a table using the NOLOGGING option (can you identify one way how?) and then apply DML, rotate the logs, and induce media failure. Without re-creating the table, can you recover now?