Friday, May 6, 2011

Rman Data Recovery Advisor in Oracle 11g

The Data Recovery Advisor automatically diagnoses corruption or loss of persistent data on disk, determines the appropriate repair options, and executes repairs at the user's request. This reduces the complexity of recovery process, thereby reducing the Mean Time To Recover (MTTR). The advisor comes in two flavors: command line mode and as a screen in Oracle Enterprise Manager Database Control.

Below are the following command used in  Data  Recovery Advisor

1.     LIST FAILURE
2.     LIST FAILURE DETAILS
3.     ADVISE FAILURE
4.     REPAIR FAILURE

Before we can start identifying and repairing failures, we need to damage a datafile.In  this scenario, I have  shut my database and open one of the datafile(user01.dbf) with wordpad(os utility )  and edit two letter and save it , and then open the database and got following error message.

C:\>sqlplus sys/ramtech@noida as sysdba
SQL*Plus: Release 11.1.0.6.0 - Production on Fri May 6 14:11:23 2011
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
Connected to an idle instance.

SQL> startup
ORACLE instance started.
Total System Global Area  426852352 bytes
Fixed Size                  1333648 bytes
Variable Size             318768752 bytes
Database Buffers          100663296 bytes
Redo Buffers                6086656 bytes
Database mounted.
ORA-01157 : cannot identify/lock data file 4 - see DBWR trace file
ORA-01110 : data file 4: 'D:\ORACLE\ORADATA\NOIDA\USERS01.DBF'

Since the   error has occurred , so we want to find out what happened. So we connect to RMAN and check the failure.

C:\>rman  target  sys/ramtech@noida
Recovery Manager: Release 11.1.0.6.0 - Production on Fri May 6 14:16:06 2011
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
connected to target database: NOIDA (DBID=1503672566, not open)

LIST FAILURE   : If there is no error, this command will come back with the message: "no failures found that match specification " and if there is an error, a more explanatory message will follow:

RMAN> list failure;
using target database control file instead of recovery catalog
List of Database Failures
========================
Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
382        HIGH     OPEN      06-MAY-11     One or more non-system datafiles are corrupt

This message shows that some datafiles are corrupt . As the datafiles belong to a tablespace other than SYSTEM, the database stays up with that tablespace being offline. This error is fairly critical, so the priority is set to HIGH. Each failure gets a Failure ID, which makes it easier to identify and address individual failures. For instance we can issue the following command to get the details of Failure 382.

LIST FAILURE DETAILS :   This command will show us the exact cause of the error.This command will give the details about the failure id i.e, 382  

RMAN> list failure 382 detail;
List of Database Failures
=========================
Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
382        HIGH     OPEN      06-MAY-11     One or more non-system datafiles are corrupt
  Impact: See impact for individual child failures
  List of child failures for parent failure ID 382
  Failure ID Priority Status    Time Detected Summary
  ---------- -------- --------- ------------- -------
  385        HIGH     OPEN      06-MAY-11     Datafile 4: 'D:\ORACLE\ORADATA\NOIDA\USERS01.DBF' is corrupt
 Impact: Some objects in tablespace USERS might be unavailable

ADVISE FAILURE  :  It responds with a detailed explanation of the error and how to correct it:

RMAN> advise failure;
List of Database Failures
=========================
Failure ID Priority Status    Time Detected Summary
---------- -------- --------- ------------- -------
382        HIGH     OPEN      06-MAY-11     One or more non-system datafiles are corrupt
  Impact: See impact for individual child failures
  List of child failures for parent failure ID 382
  Failure ID Priority Status    Time Detected Summary
  ---------- -------- --------- ------------- -------
  385        HIGH     OPEN      06-MAY-11     Datafile 4: 'D:\ORACLE\ORADATA\NOIDA\USERS01.DBF' is corrupt
    Impact: Some objects in tablespace USERS might be unavailable

analyzing automatic repair options; this may take some time
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=155 device type=DISK
analyzing automatic repair options complete
Mandatory Manual Actions
========================
no manual actions available
Optional Manual Actions
=======================
no manual actions available
Automated Repair Options
========================
Option Repair Description
------ ------------------
1      Restore and recover datafile 4
  Strategy: The repair includes complete media recovery with no data loss
  Repair script: d:\oracle\diag\rdbms\noida\noida\hm\reco_1928090031.hm

This output has several important parts. First, the advisor analyzes the error. In this case, it's pretty obvious: the datafile is corrupt . Next, it suggests a strategy. In this case, this is fairly simple as well: restore and recover the file. The dynamic performance view V$IR_MANUAL_CHECKLIST also shows this information.
However, the most useful task Data Recovery Advisor does is shown in the very last line: it generates a script that can be used to repair the datafile or resolve the issue. The script does all the work; we don't have to write a single line of code.
Sometimes the advisor doesn't have all the information it needs. For instance, in this case, it does not know if someone moved the file to a different location or renamed it. In that case, it advises to move the file back to the original location and name (under Optional Manual Actions).
So the script is prepared for us  . I would verify what the script actually does first. So, I issue the following command to "preview" the actions the repair task will execute:

RMAN> repair failure preview ;

Strategy: The repair includes complete media recovery with no data loss
Repair script: d:\oracle\diag\rdbms\noida\noida\hm\reco_1928090031.hm
contents of repair script:
   # restore and recover datafile
   restore datafile 4;
   recover datafile 4;

This is good; the repair seems to be doing the same thing I would have done myself using RMAN. Now I can execute the actual repair by issuing:

REPAIR FAILURE  :  This command will execute the above script. After recovery the tablespace it is prompt for  opening  the database .

RMAN> repair failure;
Strategy: The repair includes complete media recovery with no data loss
Repair script: d:\oracle\diag\rdbms\noida\noida\hm\reco_1928090031.hm
contents of repair script:
   # restore and recover datafile
   restore datafile 4;
   recover datafile 4;

Do you really want to execute the above repair (enter YES or NO)? yes
executing repair script
Starting restore at 06-MAY-11
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00004 to D:\ORACLE\ORADATA\NOIDA\USERS01.DBF
channel ORA_DISK_1: reading from backup piece D:\RMAN_BKP\03MBDHJ7_1_1
channel ORA_DISK_1: piece handle=D:\RMAN_BKP\03MBDHJ7_1_1 tag=TAG20110503T141047
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03
Finished restore at 06-MAY-11
Starting recover at 06-MAY-11
using channel ORA_DISK_1
starting media recovery
archived log for thread 1 with sequence 20 is already on disk as file D:\ARCHIVE\NOIDA_20_1_749730106.ARC
archived log for thread 1 with sequence 1 is already on disk as file D:\ARCHIVE\NOIDA_1_1_750184743.ARC
archived log for thread 1 with sequence 2 is already on disk as file D:\ARCHIVE\NOIDA_2_1_750184743.ARC
archived log for thread 1 with sequence 3 is already on disk as file D:\ARCHIVE\NOIDA_3_1_750184743.ARC
archived log for thread 1 with sequence 4 is already on disk as file D:\ARCHIVE\NOIDA_4_1_750184743.ARC
archived log for thread 1 with sequence 5 is already on disk as file D:\ARCHIVE\NOIDA_5_1_750184743.ARC
archived log file name=D:\ARCHIVE\NOIDA_20_1_749730106.ARC thread=1 sequence=20
archived log file name=D:\ARCHIVE\NOIDA_1_1_750184743.ARC thread=1 sequence=1
archived log file name=D:\ARCHIVE\NOIDA_2_1_750184743.ARC thread=1 sequence=2
archived log file name=D:\ARCHIVE\NOIDA_3_1_750184743.ARC thread=1 sequence=3
media recovery complete, elapsed time: 00:00:13
Finished recover at 06-MAY-11
repair failure complete
Do you want to open the database (enter YES or NO)? Y
database opened

Note how RMAN prompts us before attempting to repair. In a scripting case, we may not want to do that; rather, we would want to just go ahead and repair it without an additional prompt. In such a case, just use repair failure noprompt at the RMAN prompt.

Several views have been added to Oracle 11g to support the Data Recovery Advisor. The following views are available:
  • V$IR_FAILURE  - This view provides information on the failure. Note that records in this view can be hierarchal.
  • V$IR_FAILURE_SET -  This view provides a list of the various advice records associated with the failure. we can use this view to join the V$IR_FAILURE to the V$IR_MANUAL_CHECKLIST view.
  • V$IR_ MANUAL_CHECKLIST - This view provides detailed informational messages related to the failure. These messages provide information on how to manually correct the problem.
  • V$IR_REPAIR - This view, when joined with V$IR_FAILURE and V$IR_FAILURE_SET, can be used to provide a pointer to the physical file created by Oracle that contains the repair steps required to correct a detected error.
Enjoy    J J J

No comments: