Wednesday, April 6, 2011

Understanding the SCN

In order to understand how Oracle performs recovery, it’s first necessary to understand Oracle’s SCN (System Change Number ) in terms of the various places where it can be stored and how it’s used for instance and media recovery.

The SCN is an internal number maintained by the database management system (DBMS) to log changes made to a database. The SCN increases over time as changes are made to the database by Structured Query Language (SQL). By understanding how the SCN is used, we can understand how Oracle recovery works. Oracle enables us to examine the current SCN using the following SQL as 

SQL> select dbms_flashback.get_system_change_number from dual ; 


Whenever an application commits a transaction, the log writer process (LGWR) writes records from the redo log buffers in the System Global Area (SGA) to the online redo logs on disk. LGWR also writes the transaction’s  SCN to the online redo log file. The success of this atomic write event determines whether our transaction succeeds, and it requires a synchronous (wait-until-completed) write to disk


Note : The need for a synchronous write upon commit is one of the reasons why the online redo log can become a bottleneck for applications and why we should commit as infrequently as is practical. In general, Oracle writes asynchronously to the database datafiles for performance reasons, but commits require a synchronous write because they must be guaranteed at the time they occur.

SCN and Checkpoints : 
A checkpoint occurs when all modified database buffers in the Oracle SGA are written out to datafiles by the database writer (DBWn) process. The checkpoint process (CKPT) updates all datafiles and control files with the SCN at the time of the checkpoint and signals DBWn to write out the blocks. A successful checkpoint guarantees that all database changes up to the checkpoint SCN have been recorded in the datafiles. As a result, only those changes made after the checkpoint need to be applied during recovery. Checkpoints occur automatically as follows:

  • Whenever a redo log switch takes place.
  • Whenever the time set by the LOG_CHECKPOINT_TIMEOUT initialization parameter is reached.
  • Whenever the amount of redo written reaches the number of bytes associated with the LOG_CHECKPOINT_INTERVAL.
Typically, LOG_CHECKPOINT_INTERVAL is chosen so that checkpoints only occur on log switches. Oracle stores the SCN associated with the checkpoint in four places: three of them in the control file and one in the datafile header for each datafile.

The System Checkpoint SCN :
After a checkpoint completes, Oracle stores the system checkpoint SCN in the control file. We can access the checkpoint SCN using the following SQL as

SQL> select checkpoint_change# from v$database ; 
      CHECKPOINT_CHANGE#
        ------------------------------
              292767

The Datafile Checkpoint SCN :
After a checkpoint completes, Oracle stores the SCN individually in the control file for each datafile. The following SQL shows the datafile checkpoint SCN for a single datafile in the control file:

SQL> select   name,  checkpoint_change#   from  v$datafile  where  name  like  '%USER%'  ; 
 NAME                                               CHECKPOINT_CHANGE#
--------------------------------------------          --------------------
/u02/oradata/OMFD1/users01.dbf                  292767

The Start SCN :
Oracle stores the checkpoint SCN value in the header of each datafile. This is referred to as the start SCN because it is used at instance startup time to check if recovery is required. The following SQL shows the checkpoint SCN in the datafile header for a single datafile:

SQL> select name,checkpoint_change# from v$datafile_header where name like '%USERS01%' ;
 NAME                                                           CHECKPOINT_CHANGE#
---------------------------------------------          --------------------------
/u02/oradata/OMFD1/users01.dbf                           292767

The Stop SCN : 
The stop SCN is held in the control file for each datafile. The following SQL shows the stop SCN for a single datafile when the database is open for normal use:

SQL> select name,last_change# from v$datafile where name like  '%USERS01%' ;
 NAME                                                             LAST_CHANGE#
----------------------------------------------                ------------
/u02/oradata/OMFD1/users01.dbf  

During normal database operation, the stop SCN is NULL for all datafiles that are online in read-write mode. SCN Values while the Database Is Up Following a checkpoint while the database is up and open for use, the system checkpoint in the control file, the datafile checkpoint SCN in the control file, and the start SCN in each datafile header all match. The stop SCN for each datafile in the control file is NULL. SCN after a Clean Shutdown After a clean database shutdown resulting from a SHUTDOWN IMMEDIATE or  SHUTDOWN NORMAL of the database, followed by STARTUP MOUNT, the previous queries on v$database and v$datafile return the following:

SQL> select checkpoint_change# from v$database ; 
      CHECKPOINT_CHANGE#
    -----------------------------
                  293184

SQL> select  name,checkpoint_change#, last_change#   from v$datafile where  name  like  '%USER%';
 NAME                                                    CHECKPOINT_CHANGE#   LAST_CHANGE#
------------------------------------------             ------------------------              --------------
/u02/oradata/OMFD1/users01.dbf                      293184                     293184

SQL> select name,checkpoint_change# from v$datafile_header where name like '%USERS%' ;
 NAME                                                          CHECKPOINT_CHANGE#
--------------------------------------------       -----------------------------
/u02/oradata/OMFD1/users01.dbf                             293184

During a clean shutdown, a checkpoint is performed and the stop SCN for each datafile is set to the start  SCN from the datafile header. Upon startup, Oracle checks the start SCN in the file header with the datafile checkpoint SCN. If they match, Oracle checks the start SCN in the datafile header with the datafile stop SCN in the control file. If they match, the database can be opened because all block changes  have been applied, no changes were lost on shutdown, and therefore no recovery is required on startup. After the database is opened, the datafile stop SCN in the control file once again changes to NULL to indicate  that the datafile is open for normal use.


SCN after an Instance Crash :
The previous example showed the behavior of the SCN after a clean shutdown. To demonstrate the behavior of the checkpoints after an instance crash, the following SQL creates a table (which performs an implicit commit) and inserts a row of data into it without a commit:

SQL> create   table   x(x number)   tablespace   users ;  
SQL> insert   into   x   values  (100)  ; 

If the instance is crashed by using SHUTDOWN ABORT, the previous queries on v$database and v$datafile return the following after the database is started up in mount mode :

SQL> select checkpoint_change# from v$database;
  CHECKPOINT_CHANGE#
   --------------------------------
              293185

SQL> select name,checkpoint_change#,last_change# from v$datafile where name like  '%USER%'  ;
NAME                                                       CHECKPOINT_CHANGE#   LAST_CHANGE#
-----------------------------------                   ---------------------------          -------------------
/u02/oradata/OMFD1/users01.dbf                  293185        

In this case, the stop SCN is not set, which is indicated by the NULL value in the LAST_CHANGE# column. This information enables Oracle, at the time of the next startup, to determine that the instance crashed because the checkpoint on shutdown was not performed. If it had been performed, the LAST_CHANGE# and CHECKPOINT_CHANGE# values would match for each datafile as they did during a clean shutdown. If an instance crashes at shutdown, then instance crash recovery is required the next time the instance starts up.

Recovery from an Instance Crash :

Upon the next instance startup that takes place after SHUTDOWN ABORT or a DBMS crash, the Oracle DBMS detects that the stop SCN for datafiles is not set in the control file during startup. Oracle then performs   crash recovery. During crash recovery, Oracle applies redo log records from the online redo  logs in a process referred to as roll forward to ensure that all transactions committed before the crash are applied to the datafiles. Following roll forward, active transactions that did not commit are   identified from the rollback segments and are undone before the blocks involved in the active transactions  can be accessed . This process is referred to as roll back. In our example, the following transaction was active but not committed at the time of the SHUTDOWN ABORT, so it needs to be rolled back:


SQL> insert into x values(100);

After  instance startup,  the X table exists,  but  remains empty. Instance     recovery happens    automatically at database startup without   database administrator (DBA) intervention.   It may   take   a while because   of the    need to    apply     large    amounts   of outstanding redo changes to data    blocks    for transactions   that completed and those that didn’t complete and require roll back.

Recovery from a Media Failure :
Up   until   this   point, the  checkpoint   start   SCN in  the  datafile  header  has always  matched  the datafile checkpoint  SCN  number held  in  the control file. This is reasonable  because  during a  checkpoint,  the  datafile  checkpoint  SCN in  the control  file and  the start SCN  in the  datafile  header are  both   updated, along  with   the  system  checkpoint  SCN.  The  following  SQL  shows  the  start  SCN  from  the  datafile header  and  datafile  checkpoint SCN from  the  control file  for the  same file:

SQL> select 'controlfile' "SCN location",name,checkpoint_change# from v$datafile where name like '%USER%'
union
select 'file header',name,checkpoint_change# from v$datafile_header where name like '%USER%' ;

SCN location                      NAME                                        CHECKPOINT_CHANGE#
--------------     -------------------------------------------       --------------------
controlfile          /u02/oradata/OMFD1/users01.dbf                      293188
file header        /u02/oradata/OMFD1/users01.dbf                       293188

Unlike the v$datafile view, there is no stop SCN column in the v$datafile_header view because v$datafile_header is not used at instance startup time to indicate that an instance crash occurred. However, the v$datafile_header does provide the Oracle DBMS with the information it requires to perform media recovery. At instance startup, the datafile checkpoint SCN in the control file and the start SCN in the datafile header are checked for equality. If they don’t match, it is a signal that media recovery is required.

For example, media recovery is required if a media failure has occurred and the original datafile has been replaced with a backup copy. In this case, the start SCN in the backup copy is less than the checkpoint SCN value in the control file, and Oracle requests archived redo logs—generated at the time of previous log switches—in order to reapply the changes required to bring the datafile up to the current point in time.

In order to recover the database from a media failure, we must run the database in ARCHIVELOG mode to ensure that all database changes from the online redo logs are stored permanently in archived redo log files. In order to enable ARCHIVELOG mode, we must run the command ALTERDATABASE ARCHIVELOG when the database is in a mounted state.

We can identify files that need recovery after we have replaced a datafile with an older version by starting the instance in mount mode and running the following SQL:

SQL> select file#,change# from v$recover_file;
     FILE#         CHANGE#
   ----------     ----------
         4            313401

In this example, file 4 is the datafile in the USERS tablespace. By reexecuting the previous SQL to display the datafile checkpoint SCN in the control file and the start SCN in the datafile header, we can see that the start SCN is older due to the restore of the backup datafile that has taken place:

SQL> select 'controlfile' "SCN location",name,checkpoint_change#
from v$datafile where name like '%USER%'
union
select 'file header',name,checkpoint_change#
from v$datafile_header where name like '%USER%';

SCN location           NAME                                                   CHECKPOINT_CHANGE#
--------------    -------------------------------------------      --------------------
controlfile        /u02/oradata/OMFD1/users01.dbf                         313551
file header       /u02/oradata/OMFD1/users01.dbf                         313401

If we were to attempt to open the database, we  would receive errors like the following:
ORA-01113: file 4 needs media recovery
ORA-01110: datafile 4: '/u02/oradata/OMFD1/users01.dbf'

We   can recover the   database by issuing   RECOVER   DATABASE   from SQL*Plus while the   database  is in a mounted state. If  the  changes  needed  to  recover the  database  to the point in time   before the crash are  in an archived redo log,  then we will be prompted to accept the suggested name:

ORA-00279: change 313401 generated at 11/10/2001 18:50:23 needed for thread
ORA-00289: suggestion : /u02/oradata/OMFD1/arch/T0001S0000000072.ARC
ORA-00280: change 313401 for thread 1 is in sequence #72

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

If  we respond to  the prompt  using  AUTO, Oracle  applies  any archived  redo logs  it needs,  followed  by any  necessary changes  in the online redo logs, to bring the database right up to the last committed transaction before the  media failure that caused  the  requirement for  the  restore.

So far, we’ve considered  recovery scenarios  where the  goal is  to  recover  the database to  the most recent  transaction. This  is known as complete recovery. The RECOVER DATABASE command has several  other  options  that  enable  us to  recover  from a  backup to a point in  time before  the  most  recent  transaction  by  rolling  forward and then stopping the application of the redo log changes at a specified point. This is known as  incomplete recovery. We can specify a time or an SCN as the recovery point. For example,

RMAN> recover database until time '2001-11-10:18:52:00';
RMAN> recover database until change 313459;

Before  we perform incomplete recovery,  it’s recommended  that we restore  a complete database backup first. After incomplete recovery, we must open the mounted database with ALTER DATABASE OPEN RESETLOGS. This creates a new incarnation of the database and clears the contents of the existing redo logs to make sure they can’t be applied.

Recovery from a Media Failure Using a Backup Control File :

In  the previous example, we had access to a current control file at the time of the media failure. This  means that  none  of the  start SCN  values  in the datafile  headers exceeded  the system checkpoint  SCN number in  the control file.  To recap,  the  system  checkpoint  number  is  given  by  the  following  :

SQL> select checkpoint_change# from v$database;

We   might  be wondering  why  Oracle  needs to maintain  the last system   checkpoint value in the control file  as  well  as  checkpoint  SCNs  in the control file for each datafile (as used in the previous example). There are two reasons  for this. The first  is  that  we  might  have  read-only  tablespaces  in  our  database. In this  case, the database  checkpoint  SCN increases, and  the checkpoint  SCN for the datafiles in the read-only tablespace remains frozen in the control file.

The  following  SQL  report  output  shows  a  database  with a read-write tablespace (USERS)               and  read-only  tablespace (TEST). The  start SCN  in the file  header  and the checkpoint SCN in the control  file for  TEST  are less  than the system checkpoint value. Once a tablespace is read only, checkpoints  have no  effect on the files in it. The other read-write tablespace has checkpoint values that match the system checkpoint:

SCN location         NAME                             CHECKPOINT_CHANGE#
-------------------- ---------------------------------- ----------------
controlfile          SYSTEM checkpoint                           355390
file header          /u02/oradata/OD2/users01.dbf                355390
file in controlfile  /u02/oradata/OD2/users01.dbf                355390
file header          /u02/oradata/OD2/test01.dbf                 355383
file in controlfile  /u02/oradata/OD2/test01.dbf                 355383

The   second  reason  for  the  maintenance  of multiple  checkpoint  SCNs   in  the  control  file  is  that we might  not have  a current  control  file available  at  recovery   time. In  this  case, we need to restore  an earlier   control  file  before  we can perform a recovery. The system checkpoint in the control file may indicate  an earlier  change  than  the  start  SCN  in  the  datafile  headers.

The following SQL shows an example where the system checkpoint SCN and datafile checkpoint SCN indicate an earlier change than the start SCN in the datafile header:

SQL> select 'controlfile' "SCN location",'SYSTEM checkpoint' name,checkpoint_change# 
from v$database 
union
select 'file in controlfile',name,checkpoint_change# 
from v$datafile where name like 'users01%' 
union
select 'file header',name,checkpoint_change# 
from v$datafile_header where name like '%USER%' ; 

SCN location                 NAME                                 CHECKPOINT_CHANGE#
------------------- ------------------------------           ------------------
controlfile               SYSTEM checkpoint                            333765
file header             /u02/oradata/OD2/users01.dbf                 355253
file in controlfile    /u02/oradata/OD2/users01.dbf                 333765

If we try to recover a database in the usual way in this situation, Oracle detects that the control file is older than some of the datafiles, as indicated by the checkpoint SCN values in the datafile headers, and reports the following message:

SQL> recover database ; 
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done

If  we  want to proceed with  recovery in this situation, we need to indicate to Oracle that  a noncurrent control  file  possibly containing  mismatches  in the SCN  values  identified  by  the   previous  error  messages  is  about to  be specified  for recovery  by  using  the  following command:

RMAN> recover database using BACKUP CONTROLFILE  ; 

Reference:   http://www.dbapool.com/


Enjoy     :-) 


No comments: