10 Problems with your RMAN backup script

35
1 10 Problems with your RMAN backup script 1 Michael S. Abbey Presented by: Yury Velikanov Senior Oracle DBA The Pythian Group April 2012

description

Yuri is called to audit RMAN backup scripts on regular basis for several years now as part of his Day to Day duties. He see the same errors in scripts that Oracle DBAs using to backup critical databases over and over again. Those errors may play a significant role in a recovery process when you working under stress. During that presentation you will be introduced to typical issues and hints how to address those.

Transcript of 10 Problems with your RMAN backup script

Page 1: 10 Problems with your RMAN backup script

10 Problems with yourRMAN backup script

1Michael S. Abbey

Presented by: Yury VelikanovSenior Oracle DBA The Pythian GroupApril 2012

Page 2: 10 Problems with your RMAN backup script

2

WHY PYTHIAN● Recognized Leader:

– Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and SQL Server

– Work with over 150 multinational companies such as Forbes.com, Fox Sports, Nordion and Western Union to help manage their complex IT deployments

● Expertise:

– One of the world’s largest concentrations of dedicated, full-time DBA expertise. Employ 7 Oracle ACEs/ACE Directors

– Hold 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC

● Global Reach & Scalability:

– 24/7/365 global remote support for DBA and consulting, systems administration, special projects or emergency response

Page 3: 10 Problems with your RMAN backup script

3

• Google Yury Oracle [LinkedIn, twitter, blog, email, mobile, …]

- Email me to get the presentation

• Sr. Oracle DBA at Pythian, Oracle ACE and OCM

• Started as Oracle DBA - with 7.2 (in 1997, 14+)

• First international appearance- 2005 - Hotsos Symposium 2005, …

• Education (Masters Degree in Computer science)- OCP 7/8/8i/9i/10g/11g + OCM 9i/10g/11g

• Oracle DBA consultant experience (14+ years)• Pythian Oracle Clients support (2+ years)

- 140+ Clients around the world- RMAN scripts audit, troubleshooting,

recovery

FEW WORDS ABOUT YURY

Page 4: 10 Problems with your RMAN backup script

4

Give you practical 10 hints on RMAN script improvements.

Encourage you to think on what can possibly go wrong before it happens.

Give away some prizes

MISSION

Page 5: 10 Problems with your RMAN backup script

5

● If backups and trial recovery works it doesn’t mean you don’t have issues (must test/document/practice recovery)

● Challenge your backup procedures! - Think about what can possibly go wrong- Think now as in the middle of an emergency recovery it

may be way too late or too challenging

● Prepare all you may need for smooth recovery while working on backup procedures

RIGHT APPROACH ... ! BE SKEPTICAL !

Page 6: 10 Problems with your RMAN backup script

6

FEW GENERAL THOUGHTS …

NEVER rely on backups stored on the same physical media as the database!

Mark Brinsmead, Sr. Oracle DBA, PythianEven if your storage is the fanciest disk array (misnamed "SAN" by many) in the world, there exist failure modes in which ALL data in the disk array can be lost simultaneously. (Aside from fire or other disaster, failed firmware upgrades are the most common.) You don't really have a "backup" until the backup is written to separate physical media!

Page 7: 10 Problems with your RMAN backup script

7

FEW GENERAL THOUGHTS …

Avoid situations where the loss of a single piece of physical media can destroy more than one backup.

When backing up to tape, for example, if the tape capacity is much larger than your backups, consider alternating backups between multiple tape pools. ("Self-redundant" backups are of little value if you are able to lose several consecutive backups simply by damaging one tape cartridge).

If your backup and recovery procedures violate some of the base

concepts - state risks clearly and sign/discuss those

with business on regular basis.

Page 8: 10 Problems with your RMAN backup script

8

#1 RMAN LOG FILES

Prepare all you may need for smooth recovery while working on backup procedures

part of a log file ...

RMAN>Starting backup at 18-OCT-11current log archivedallocated channel: ORA_DISK_1channel ORA_DISK_1: SID=63 device type=DISKchannel ORA_DISK_1: starting compressed archived log backup setchannel ORA_DISK_1: specifying archived log(s) in backup setinput archived log thread=1 sequence=4 RECID=2 STAMP=764855059input archived log thread=1 sequence=5 RECID=3 STAMP=764855937...Finished backup at 18-OCT-11

Do you see any issues?

Page 9: 10 Problems with your RMAN backup script

9

How about now?

part of a log file ...

RMAN> backup as compressed backupset database2> include current controlfile3> plus archivelog delete input;

Starting backup at 2011/10/18 12:30:46current log archivedallocated channel: ORA_DISK_1channel ORA_DISK_1: SID=56 device type=DISKchannel ORA_DISK_1: starting compressed archived log backup setchannel ORA_DISK_1: specifying archived log(s) in backup setinput archived log thread=1 sequence=8 RECID=6 STAMP=764856204input archived log thread=1 sequence=9 RECID=7 STAMP=764857848...Finished backup at 2011/10/18 12:33:54

#1 RMAN LOG FILES

Page 10: 10 Problems with your RMAN backup script

10

#1 RMAN LOG FILESBefore calling RMAN

• export NLS_DATE_FORMAT="YYYY/MM/DD HH24:MI:SS"• export NLS_LANG="XX.XXX_XXX"(for non standard char sets)

before running commands• set echo on

Nice to have: total execution time at the end of log filec_begin_time_sec=`date +%s`...c_end_time_sec=`date +%s`v_total_execution_time_sec=`expr ${c_end_time_sec} - ${c_begin_time_sec}`

echo "Script execution time is $v_total_execution_time_sec seconds"

Page 11: 10 Problems with your RMAN backup script

11

#1 RMAN LOG FILES● do not overwrite log file from previous backup

full_backup_${ORACLE_SID}.`date +%Y%m%d_%H%M%S`.log

Use case: a backup failed• should I run the backup now?• would it interfere with business activities?

Page 12: 10 Problems with your RMAN backup script

14

KISS = KEEP IT STUPID SIMPLE

crosscheck archivelog all;delete noprompt expired archivelog all;

backup database include current controlfile plus archivelog delete input;

delete noprompt obsolete;

& NoYes!?

Page 13: 10 Problems with your RMAN backup script

15

#2 DO NOT USE CROSSCHECK (ANTI KISS)

● Do not use CROSSCHECK in your day to day backup scripts!

● If you do, RMAN silently ignores missing files, possibly making your recovery impossible

● CROSSCHECK should be a manual activity executed by a DBA to resolve an issue

Page 14: 10 Problems with your RMAN backup script

16

#3 BACKUP CONTROL FILE AS THE LAST STEP

backup as compressed backupset database plus archivelog delete inputinclude current controlfile;

delete noprompt obsolete;

exit

We are making the controlfile backup inconsistent immediately

Is it right?

Page 15: 10 Problems with your RMAN backup script

17

#3 BACKUP CONTROL FILE AS THE LAST STEP

backup as compressed backupset database plus archivelog delete input;

delete noprompt obsolete;

backup spfile;

backup current controlfile;

exit

Is it right?

Page 16: 10 Problems with your RMAN backup script

18

#4 DO NOT RELY ON ONE BACKUP ONLY● Do not rely on ONE backup only!

• You should always have a second option● Especially true talking about ARCHIVE LOGS

• If you miss a single ARCHIVE LOG your recoverability is compromised

-- ONE COPY ONLYBACKUP DATABASE ... PLUS ARCHIVELOG DELETE INPUT;-- SEVERAL COPIESBACKUP ARCHIVELOG ALL NOT BACKED UP $v_del_arch_copies TIMES;

Page 17: 10 Problems with your RMAN backup script

19

#5 DO NOT DELETE ARCHIVE LOGS BASED ON TIME ONLY-- TIMESTAMPDELETE NOPROMPT BACKUP OF ARCHIVELOG ALL COMPLETED BEFORE 'SYSDATE-6/24' DEVICE TYPE DISK;

-- SEVERAL COPIES + TIMEDELETE NOPROMPT ARCHIVELOG ALL BACKED UP $v_del_arch_copies TIMES TO DISK COMPLETED BEFORE '$p_start_of_last_db_backup';

-- SEVERAL COPIES + TIME + STANDBYCONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON STANDBY;

Page 18: 10 Problems with your RMAN backup script

20

#6 USE CONTROLFILE IF CATALOG DB ISN'T AVAILABLE[oracle@host01 ~]$ rman target / catalog rdata/xxxRecovery Manager: Release 11.2.0.2.0 - Production on Tue Oct 18 15:15:25 2011...connected to target database: PROD1 (DBID=1973883562)RMAN-00571: ===========================================================RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============RMAN-00571: ===========================================================RMAN-00554: initialization of internal recovery manager package failedRMAN-04004: error from recovery catalog database: ORA-28000: the account is locked[oracle@host01 ~]$● Check if catalog DB is available in your script

• If it is, connect to catalog DB• If it isn’t, use controlfile only (flagging it as warning)

Page 19: 10 Problems with your RMAN backup script

21

#6 USE CONTROLFILE IF CATALOG DB ISN'T AVAILABLErman target /RMAN> echo set onRMAN> connect target *connected to target database: PROD1 (DBID=1973883562)RMAN> connect catalog *RMAN-00571: ===========================================================RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============RMAN-00571: ===========================================================RMAN-04004: error from recovery catalog database: ORA-28000: the account is locked

RMAN> backup as compressed backupset database2> include current controlfile3> plus archivelog delete input;

Starting backup at 2011/10/18 15:22:30current log archived using target database control file instead of recovery catalog

special THX 2 @pfierens 4 discussion in tweeter

Page 20: 10 Problems with your RMAN backup script

22

-- Backup partrman target / <<!backup as compressed backupset database...!

-- Catalog synchronization partrman target / <<!connect catalog rdata/xxxresync catalog;!

special THX 2 @martinberx 4 discussion in tweeter

#6 USE CONTROLFILE IF CATALOG DB ISN'T AVAILABLE

Page 21: 10 Problems with your RMAN backup script

23

#7 DO NOT RELY ON RMAN STORED CONFIGURATION

● Do not rely on controlfile autobackup

CONFIGURE CONTROLFILE AUTOBACKUP ON;CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/b01/rman/prod/%F’;

● Oracle creates a controlfile backup copy● each time you make any db files related changes● at the end of each RMAN backup ● What would happen if someone switched autobackup off?

Page 22: 10 Problems with your RMAN backup script

24

● Document configuration in a log file (show all;)● If you change configuration restore it at the end of your script

show all;v_init_rman_setup=`$ORACLE_HOME/bin/rman target / <<_EOF_ 2>&1|grep "CONFIGURE " |sed s/"# default"/""/gshow all;_EOF_`

...< script >...

echo $v_init_rman_setup | $ORACLE_HOME/bin/rman target /

#7 DO NOT RELY ON RMAN STORED CONFIGURATION

Page 23: 10 Problems with your RMAN backup script

25

#8 BACKUPS’ CONSISTENCY CONTROL

● How do you report backup failures and errors?● We don’t report at all● DBA checks logs sometimes● Backup logs are sent to a shared email address (good!)● DBA on duty checks emails (what if no one available/no email

received?)● We check RMAN command errors code $? and sending email

Failure Verification and Notification

Page 24: 10 Problems with your RMAN backup script

26

● I would suggest• Run log files check within backup script and page immediately• Script all checks and use "OR" in between

• echo $?• egrep "ORA-|RMAN-" < log file >• Improve your scripts and test previous adjustments on regular basis

• PAGE about any failure to oncall DBA immediately• DBA makes a judgment and takes a conscious decision

• PAGE about LONG running backups

Failure Verification and Notification

#8 BACKUPS’ CONSISTENCY CONTROL

Page 25: 10 Problems with your RMAN backup script

27

● How do you check if your database is safely backed up based on your business requirements?

● Make a separate check that would page you if your backups don’t satisfy recoverability requirements

REPORT NEED BACKUP ...-- datafiles that weren’t backed up last 24 hours! – a bit excessive (2)REPORT NEED BACKUP RECOVERY WINDOW OF 1 DAYS;REPORT NEED BACKUP REDUNDANCY 10;REPORT NEED BACKUP DAYS 2;REPORT UNRECOVERABLE;

Notifications are not enough!

May not available in all Versions!

#8 BACKUPS’ CONSISTENCY CONTROL

Page 26: 10 Problems with your RMAN backup script

28

IF YOU DON’T HAVE RMAN AND MML INTEGRATION

● You should consider using it!• Otherwise it is extremely difficult to ensure backup consistency

● If you don’t use it then your backups are exposed to many issues• At best, your backups will take much more space on tapes than should• In worst case you may miss to backup some of the backup pieces , putting

the database recoverability at risk

● The next few slides discuss some issues

Page 27: 10 Problems with your RMAN backup script

29

#9 ENSURE 3 TIMES FULL BACKUPS SPACE + ARCH● IF you don’t have

● A smart backup software (incremental/opened files)● Sophisticated backup procedures

● THEN you need space on a file system for at least 3 FULL backups and ARCHIVE LOGS generated in between 3 backups

● If REDUNDANCY 1 then previous backup and ARCHIVE LOGS got removed after next backup is completed. There is no continued REDO stream on tapes.

● If REDUNDANCY 2 then you need space for third full backup of backup time only (as soon as third backup completed you remove the first one)

Page 28: 10 Problems with your RMAN backup script

30

#9 DON’T USE “DELETE OBSOLETE” (DISK + NO MML TAPE)● This way you wipe out RMAN memory. There is no way RMAN knows about backups available on

tapes.● Think about recovery (if you use “delete noprompt obsolete”)

1. You need to recover a control file (possibly from offsite backups)2. Find and bring onsite all tapes involved (possibly several iterations)3. Restore and recover (possibly restoring more ARCH backups)

backup as compressed backupset database plus archivelog delete inputinclude current controlfile;delete noprompt obsolete;exit

Page 29: 10 Problems with your RMAN backup script

31

-A- LIST BASED ON DISK RETENTIONreport obsolete recovery window of ${DISK_RETENTION_OS} days device type disk;

-B- REMOVE FILES BASED ON DISK RETENTION !checking if each of reported files have been backed up to tapes & rm it from FS!

-C- WIPEOUT FROM REPOSITORYdelete force noprompt obsolete recovery window of ${TAPE_DAY_RETENTION} days device type disk;

-!- AT THE RECOVERY TIMERUN{SET UNTIL SCN 898570;RESTORE DATABASE PREVIEW;}

#9 DON’T USE “DELETE OBSOLETE” (DISK + NO MML TAPE)

Page 30: 10 Problems with your RMAN backup script

33

#9 NEVER KEEP DEFAULT RETENTION POLICY● NEVER allow the RMAN RETENTION POLICY to remain

at the default or lower level than TAPE retention• other Oracle DBA can run DELETE OBSOLETE command

and wipe all catalog records out

CONFIGURE RETENTION POLICY TO REDUNDANCY 1000;CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 1000 DAYS;

Page 31: 10 Problems with your RMAN backup script

34

#10 HALF WAY BACKED UP FILE SYSTEM FILES

● Make sure that your File System backup doesn’t backup half finished backupset

-A- BACKUP AS TMPBACKUP DATABASE FORMAT '${file}.tmp_rman';

-B- MOVE TO PERMANENTmv ${file}.tmp_rman ${file}.rman

-C- MAKE CATALOG AWARECHANGE BACKUPPIECE '${file}.tmp_rman' UNCATALOG;CATALOG BACKUPPIECE '${file}.rman';

Page 32: 10 Problems with your RMAN backup script

35

DO WE HAVE A WINNER?#1 RMAN Log files#2 Do not use CROSSCHECK#3 Backup control file as the last step#4 Do not rely on ONE backup only#5 Do not delete ARCHIVE LOGS based on time only#6 Use controlfile if catalog DB isn't available#7 Do not rely on RMAN stored configuration#8 Backups’ consistency control#9 Don’t use “delete obsolete” (disk + no mml tape)#10 Half way backed up File System files

Page 33: 10 Problems with your RMAN backup script

36

CONTACT ME

[email protected]

@yvelikanov

www.pythian.com/news/velikanov

Yury Oracle

Page 34: 10 Problems with your RMAN backup script

37

THANK YOU AND Q&A

37

http://www.pythian.com/news/

http://www.facebook.com/pages/The-Pythian-Group/

http://twitter.com/pythian

http://www.linkedin.com/company/pythian

1-866-PYTHIAN

[email protected]

To contact us…

To follow us…

Page 35: 10 Problems with your RMAN backup script

38

ADDITIONAL TOPICS #A DELETE OBSOLETE to be executed at the begging#B CATALOG or not to CATALOG#C Number of backup & recovery processes (# of backup pieces)#D CATALOG Keep as less information as reasonable#E “rman target / catalog rdata/xxx” – isn’t secure