27 October 2009

HOWTO: Oracle Cross-Platform Migration with Minimal Downtime

Originally posted at The Pythian Group blog.

I recently performed a migration from Oracle 10gR2 on Solaris to the same version on Linux, immediately followed by an upgrade to 11g. Both platforms were x86-64. Migrating to Linux also included migrating to ASM, whereas we had been using ZFS to hold the datafiles on Solaris. Restoring files into ASM meant we would have to use RMAN (which we would probably choose to use anyway).

As with many databases, the client wanted minimal downtime. It was obvious to us that the most time-consuming operation would be the restore and recovery into the new instance. We were basically doing a restore and recovery from production backups and archived redo logs. It quickly dawned on me that we could start this operation well before the scheduled cutover time and downtime window, chopping at least six hours from the downtime window. The client would only need to keep the new instance in mount mode after the initial restore/recovery finished, periodically re-catalog the source instance's FRA (which was mounted via NFS), and then re-run the recover database command in RMAN. Once the time comes to cutover, simply archivelog current the original instance and shutdown immediate. Then open the new instance with the RESETLOGS option, and voila! Migration complete!

I'll try to recreate a simple example here.

On the new instance, assume that you have ASM set up and configured with two disk groups, DATA and REDO. Let's also assume that you have the FRA directory (or the directory where you hold backups and archivelogs) mounted from the Solaris host to the Linux host under /solfra/prod/. Here's the command we used:

mount -o hard,bg,tcp,nfsvers=3,suid,rsize=32768,wsize=32768,nointr,noac,ro solhost:/fra /solfra/

We then need to get a pfile and edit it for ASM-ification. You can either restore the spfile from backup or just export a pfile from the current running production instance. To do the former, you'd need to go into RMAN and run an operation similar to this (best to ensure that the ORACLE_HOME, ORACLE_SID, and other Oracle-related environment variables are set properly):

RMAN> startup force


startup failed: ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file '/home/oracle/app/oracle/product/10.2.0/db_1/dbs/initprod.ora'


starting Oracle instance without parameter file for retrival of spfile
Oracle instance started
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of startup command at 06/29/2009 13:34:50
ORA-00205: error in identifying control file, check alert log for more info
RMAN>
RMAN> restore spfile to pfile 'initprod.ora' from autobackup recovery area = '/solfra/' db_unique_name = 'PROD';


Starting restore at 29-JUN-09
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=36 devtype=DISK


recovery area destination: /solfra/
database name (or database unique name) used for search: PROD
channel ORA_DISK_1: autobackup found in the recovery area
channel ORA_DISK_1: autobackup found: /solfra/PROD/autobackup/2009_06_29/o1_mf_s_690813149_54kypcxz_.bkp
channel ORA_DISK_1: SPFILE restore from autobackup complete
Finished restore at 29-JUN-09

From here you can go into sqlplus and CREATE PFILE FROM SPFILE;.

Now we have our text pfile in $ORACLE_HOME/dbs/initprod.ora, and we want to make these changes specific to the transition to ASM:

*.control_files='+DATA','+REDO'
*.db_create_file_dest='+DATA'
*.db_create_online_log_dest_1='+REDO'
*.db_recovery_file_dest='/path/to/fra'
*.db_recovery_file_dest_size='2000G'

Obviously, the last two parameters about location and size of the FRA should be tailored to your setup.

We then create a new SPFILE with CREATE SPFILE FROM PFILE;.

Now we're ready to restore the controlfile(s) from backup:

RMAN> startup force;


Oracle instance started
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of startup command at 08/13/2009 20:20:56
ORA-00205: error in identifying control file, check alert log for more info


RMAN> restore controlfile from autobackup recovery area = '/solfra/' db_unique_name = 'PROD';


Starting restore at 13-AUG-09
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=650 devtype=DISK


recovery area destination: /solfra/
database name (or database unique name) used for search: PROD
channel ORA_DISK_1: autobackup found in the recovery area
channel ORA_DISK_1: autobackup found: /solfra/PROD/autobackup/2009_08_13/o1_mf_s_694801261_5890xyoj_.bkp
channel ORA_DISK_1: control file restore from autobackup complete
output filename=+DATA/prod/controlfile/current.256.694815695
output filename=+REDO/prod/controlfile/current.256.694815697
Finished restore at 13-AUG-00

Note that if you restored the spfile from backup, you don't need to specify the recovery area in the restore controlfile from autobackup command. In this case, because I chose to create a pfile from the source instance, I needed to supply the recovery area location. Note that this is the recovery area that holds the source backups, not the recovery area that your new instance will use once it is opened.

With the control files restored, we can mount the new instance:

RMAN> alter database mount;


database mounted
released channel: ORA_DISK_1

The next step is to catalog the backup files and archivelogs from the remote FRA.

RMAN> CATALOG START WITH '/solfra/PROD' NOPROMPT;


Starting implicit crosscheck backup at 13-AUG-09
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=646 devtype=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: sid=650 devtype=DISK
Crosschecked 565 objects
Finished implicit crosscheck backup at 13-AUG-09


Starting implicit crosscheck copy at 13-AUG-09
using channel ORA_DISK_1
using channel ORA_DISK_2
Finished implicit crosscheck copy at 13-AUG-09


searching for all files in the recovery area
cataloging files...

This will continue listing all the files in the catalog, and some files that were not cataloged (e.g. online redo logs, non-RMAN files, etc.).

Now we're ready to begin our RESTORE phase. You'll need an rman command file that looks something like this:

run {
set newname for datafile X to new;
# ... (repeat for other datafiles)
set newname for tempfile Y to new;
# ... (repeat for other tempfiles)
restore database;
switch datafile all;
switch tempfile all;
}
recover database;

Note that X and Y are the file# of the individual datafile and tempfile, respectively. You will probably want to generate this file if you have a large number of datafiles.

It is important that you read MetaLink Doc 415884.1, if you are performing a migration to a different platform with the same endian. Oracle suggests that you should perform a CONVERT DATABASE, but later admits that you only need to perform a CONVERT on tablespaces containing UNDO segments:
The convert database command is a no operation command for all files that do not contain undo but they are scanned regardless during the conversion. Using the convert tablespace/datafile for those containing undo will suffice and all other source files not containing undo can be moved to the target (same endian) platform without the Recovery Manager conversion. This will greatly reduce the time required to move and database with thousands of files from source to target platform also decreasing the amount of time the database is unavailable.
We had an internal debate over whether or not it is truly necessary even to do this. However, I would advise you to read this document and discuss it with Oracle Support Services. They would most likely try to convince you to do it or threaten to not support your database.

Once the script finishes with the recover database portion, this query should indicate that datafile headers and controlfiles are in sync:

SQL> alter session set nls_date_format='YYYY/MON/DD hh24:mi:ss';
Session altered.


SQL> select checkpoint_time,fuzzy,count(*),status
from ( select checkpoint_time,fuzzy,status
from v$datafile_header
union all
select controlfile_time,'CTL',null from v$database)
group by checkpoint_time,fuzzy,status;


CHECKPOINT_TIME FUZ COUNT(*) STATUS
-------------------- --- ---------- -------
2009/AUG/14 02:22:11 CTL 1
2009/AUG/14 02:22:11 YES 1266 ONLINE

The database is still in mount mode. If we were ready to cut over now, we could open the database with the RESETLOGS option. However, we are still a few hours or a day away from the cutover. What we want to do is apply more logs from the current production instance as they get archived. To minimize the babysitting, we decided to run this in a script once every few hours:

CATALOG START WITH '/solfra/PROD' NOPROMPT;
RECOVER DATABASE;

As you can guess, this re-catalogs the NFS-mounted FRA, ignoring files that were already cataloged. So now the controlfile knows about the new archivelogs. The RECOVER DATABASE command then beautifully applies logs from where it left off until it runs out of logs again. We re-run this periodically to keep our new instance as close to current as possible until it's time to cut over.

When the cutover time comes, we run this on the current production instance:

SQL> alter system archive log current;


System altered.


SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

Then, on the new instance, we do one last catalog and recover. This should bring us completely up to date with the original prod instance. Now we just need to open with resetlogs:

SQL> alter database open resetlogs;
Database altered.

In my case, I manually renamed the online and standby redo logs to +REDO (they still had the old filesystem names in v$logfile), but I believe that OPEN RESETLOGS does this automatically anyway.

And so, here we are in Linux on on the same version (10.2.0.2) as Solaris with appropriate one-off patches, using ASM. At this point, I was just under 30 minutes into my downtime window since the original prod instance was shutdown. I don't think that's too bad for moving a 1.3T database!

After this, I immediately went into upgrading to 11gR1 (11.1.0.7) and then building a new standby instance, but that is outside the scope of the initial migration that I wanted to share with you today. Cheers!

No comments:

Post a Comment