28 February 2014

The Danger of Moving Incrementally Updated Datafile Copies

When I sat down at my desk yesterday morning I was greeted with some disturbing email alerts notifying me that one of the NFS mounts on my standby database host was full. This was the NFS mount that held an image copy of my database that is updated daily from an incremental backup. The concept and an example can be found in the documentation. With a 25Tb database, waiting to restore from backups is not as appealing as simply switching to the copies and getting back to business.

We quickly saw that the reason that this mount was full was that RMAN had tried to make another set of image copies in the latest backup run rather than take an incremental backup for recovery. It does this when it finds no valid copy of the datafiles to increment, and the logs confirmed this to be the reason:

Starting backup at 2014/02/26 13:30:16
no parent backup or copy of datafile 792 found
no parent backup or copy of datafile 513 found
no parent backup or copy of datafile 490 found
no parent backup or copy of datafile 399 found

... and so on, for every datafile. However I knew that the copies that had been there (and been updated) every day were still there. So what was different?

It was then that I remembered my work from the day before. Doing a bit of re-organization, I renamed the directory where the datafile copies lived. However I made sure to re-catalog them and double-checked to make sure the backup tag was still there, which it was. I also crosschecked the copies to make the old entries as expired and then deleted them from the catalog. This turned out to be the cause of the problem.

When the original datafilecopy entries were removed from the catalog, RMAN didn't want to recognize the new entries as the right copies, even though they were literally the same file, with the same backup tag. And so RMAN printed the message you see above and dutifully began making new image copies until it filled up the mountpoint, which didn't have another spare 25 Tb handy.

Today I was able to duplicate the scenario on a (much smaller) sandbox with various sequences. Every time, once I crosschecked the original copies and deleted them as expired, RMAN would create a new copy on the next run. The sequence was basically this:
  1. Run backup-for-recovery and recover commands. First time will create datafile copies as expected.
  2. Run it again, this time it will create an incremental backup and then apply it to the copies made in the previous step.
  3. Move or rename the directory holding the copies.
  4. CROSSCHECK COPY; & DELETE EXPIRED COPY;
  5. CATALOG START WITH '/path/to/new/location/';
  6. LIST DATAFILECOPY ALL; to verify that the copies are registered under the new location and the TAG is right.
  7. Run backup-for-recovery and recover commands (be sure to update the location). I would expect the same results as step 2, but instead new copies are created.
One thing that was very interesting was that if I just cataloged the new location, but did not crosscheck or delete the old entries (i.e. skipped step 4), then I could run the script and it would take an incremental backup as planned and recover the copies in the new location. But then if I later did the crosscheck and delete, it would not accept those copies and create new copies. And all this time I can "list datafilecopy all;" and see both copies with the same tags. Changing the order of steps 4 and 5 made no difference.

I'd be interesting to know what anyone else thinks about it. Personally it seems like a bug to me, so I've opened an SR. So far Oracle Support have confirmed what I've experienced, although have said there is no bug on file. They suggested I use Doc ID 1578701.1 to make another copy of the datafile with a new tag and use that new tag. However if I wanted to do that I would just create a new database copy and keep using the original tag, which is exactly what I've done.

I will be sure to update this post with anything I find. Until then, I wanted to share this experience for anyone else that might need or want to move their datafile copies if they are part of an incrementally-updated-backup strategy.

1 comment:

  1. did u find anything from oracle.
    we are also noticing something similar. our rman keeps forgeting datafile copies at random and does fresh copies. so far we havent not figured out why.

    Also in our sceanrio we dont even move these files around. It just seems to be doing fiine for a week after we cleanup all backups . then randmon second copies are done. we have recovery of 2 weeks.

    ReplyDelete