11 October 2013

UNTIL CANCEL UNTIL CANCEL UNTIL CANCEL

Yesterday I was creating a new Oracle 11.2.0.3 database from a copy of datafiles and archivelogs taken from our standby. I was sure to include archivelogs from just prior to well after the span of the datafile backup time. I had created a new controlfile, gotten everything mounted and recovered all of the archivelogs I had using "recover database using backup controlfile;". The next step was to open the new database with reset logs, right? WRONG.

Here is my sqlplus session showing the end of the recovery and my attempt to open the database:

ORA-00279: change 506393850182 generated at 10/04/2013 20:54:13 needed for
thread 3
ORA-00289: suggestion :
/mnt/prod/fra/PROD/archivelog/2013_10_10/o1_mf_3_36561_%u_.arc
ORA-00280: change 506393850182 for thread 3 is in sequence #36561
ORA-00278: log file
'/mnt/
prod/fra/arch/arch_D-PROD_id-3948365078_S-36560_T-3_A-767893550_nkom2l82'
no longer needed for this recovery


ORA-00308: cannot open archived log
'/mnt/
prod/fra/PROD/archivelog/2013_10_10/o1_mf_3_36561_%u_.arc'
ORA-17503: ksfdopn:4 Failed to open file
/mnt/
prod/fra/PROD/archivelog/2013_10_10/o1_mf_3_36561_%u_.arc
ORA-17500: ODM err:File does not exist

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1:
'/mnt/
prod/data/PROD/datafile/data_D-PROD_I-3948365078_TS-SYSTEM_FNO-1_c4olc3jf'



This didn't make sense to me. I had been sure to include and recover archivelogs from beyond the database copy backup. No datafiles in v$datafile were listed as fuzzy. Everything should be consistent! On a whim (and out of ideas), I decided to restore 15 more archivelogs from the standby, fed them to the new database but still same result.

Then I looked at how that recovery phase ended and grew suspicious. I had just used the "recover database using backup controlfile;" syntax because that was listed in the trace controlfile I had based my scripts off of. I noticed now that it didn't have an "UNTIL CANCEL" clause at the end that I had used years in the past when doing recoveries like this. So I decided to test this by issuing the command with that clause and just cancelling right away:


SQL> recover database using backup controlfile until cancel;
ORA-00279: change 506393850182 generated at 10/04/2013 20:54:13 needed for
thread 3
ORA-00289: suggestion :
/mnt/
prod/fra/PROD/archivelog/2013_10_10/o1_mf_3_36561_%u_.arc
ORA-00280: change 506393850182 for thread 3 is in sequence #36561


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
CANCEL
Media recovery cancelled.

SQL> alter database open resetlogs;

Database altered.

Bingo, works like a charm. My guess is that if I had used UNTIL CANCEL in my initial recovery, it would have ended cleanly when it ran out of archivelogs to recover and I would have been able to open the database then as well.

However I have a beef with what is, in my humble opinion, a misleading error message. That system datafile really didn't need any more recovery. To me this is basically a bug in Oracle over semantics in how the log recovery ended. I'd be interested to know what others think, please comment below.

But for now, I'll be sure to repeat "UNTIL CANCEL" to myself when manually recovering archivelogs like this.

5 comments:

  1. I think it's more generally that Oracle can't reconcile whatever file actually can't be completely defuzzed, the controlfile, and the system file, so it arbitrarily says data file 1.

    Kind of hard to really have an good error message when you've told it everything should be there and that's not true, though it would be better if it barfed up what it was looking for in all those places.

    ReplyDelete
  2. I don't see a bug here, it's well documented behaviour:
    due to http://docs.oracle.com/cd/E11882_01/backup.112/e10643/rcmsynta2001.htm#i84135
    "... By default, RMAN performs complete recovery. ..." , which by http://docs.oracle.com/cd/E11882_01/backup.112/e10642/glossary.htm#sthref2283 means, that all redo generated after the restored backup has to be applied.
    Since you are using backup controlfile, rman doesn't know all this redo, so the using-clause is IMHO a *must* here.

    ReplyDelete
    Replies
    1. meant to say "the until-clause is a must here" - rman doesn't know which redo has been generated, you have to cancel the recovery.
      It is not a matter of locating files, it is a matter of the semantics of the recover-command

      Delete
    2. ++^ - This is working as defined. Any time you restore a controlfile you are in incomplete recovery. To end incomplete recovery you either need to do it with rman (set until time) or you can do it manually like you did (with the until cancel clause). If you don't, oracle will keep prompting for the next archivelog sequence for ever and ever.

      Delete
    3. Thanks everyone for your feedback. I do see your points. Since I as the DBA/operator know that these are all the logs that I have, I'll have to make sure to use UNTIL CANCEL from now on.

      Delete