26 March 2014

Battling Bigfile Backup Bottlenecks

Last Friday I kicked off a database backup to an NFS destination, using the standard "backup as compressed backupset database" syntax. Loyal readers of this blog may recall that I'm the proud custodian of a 25 Tb database, so this backup normally takes a few days, with an expected completion on Monday morning. However it was still running on Wednesday, and reviewing the logs I saw that there was just 1 channel (of the original 8) still running. The backup file that this channel was writing happened to include our largest bigfile datafile, which weighs in at nearly 8 Tb. Reviewing my new backup script I realized that I had neglected to specify a SECTION SIZE parameter. An example of its usage is:

RMAN> backup as compressed backupset
2> section size 64G
3> database;

Without it, RMAN has decided to create a backup piece that bundled my 8 Tb datafile with a few others and then write it out to disk on one channel. Obviously this isn't what we wanted.


I'm not a big fan of bigfile tablespaces, primarily because you lose the benefits of parallelism when huge files can only be handled by a single channel, as with datafile copy operations and backups. In 11g, however, Oracle has introduced multi-section backups via the SECTION SIZE option for RMAN backups. This option tells RMAN to break a large file into sections of the specified size so that the work can be done in parallel. If the specified size is larger than the file, then it is simply ignored for that file.

There is a limitation in that the file can be split into a maximum of 256 sections. So, if you specify a section size that would result in more than 256 sections being created, Oracle RMAN will increase the size so that exactly 256 sections are created. This is still enforced today in Oracle 12c.

Another limitation in Oracle 11g is that multi-section backups cannot be done with image copy backups. Those must still be done as a whole file and so can still be a huge bottleneck. However this is no longer a problem in Oracle 12c, and multi-section image copy backups are possible. I'm looking forward to using this as we also use image copy backups as part of our recovery strategy.

To highlight the parallel benefits, I ran a compressed backup of a 1.5 Tb bigfile tablespace using 8 channels. The first one does NOT use section size and so only goes over one channel:

Starting backup at 2014/03/25 14:35:41
...
channel c1: backup set complete, elapsed time: 17:06:09
Finished backup at 2014/03/26 07:41:51

The second one uses a section size of 64 Gb (otherwise same command & file):

Starting backup at 2014/03/26 07:41:52
...
Finished backup at 2014/03/26 09:48:33

You can see the huge impact made by making use of the multi-section backup option. A single channel took over 17 hours to back it up. Using 8 channels with a section size of 64 Gb took only just over 2 hours. Eyeballing the log shows an average of around 6 minutes per section. Definitely much better than waiting for a single channel to do all the work when the rest of the system is waiting.

6 comments:

  1. I'm still not clear about the exact diffs between "SECTION SIZE" and "MAXPIECESIZE". I use bigfiles all through my dbs, and MAXPIECESIZE 5G channels in all my rman scripts. Adding SECTION SIZE 5G (or replacing MAXPIECESIZE by it) makes absolutely no difference! Takes exactly the same time, bigfiles are all in excess of 50G and backup files all 5G except for a couple here and there of lesser size - usually the last ones to be created. 11.2.0.3 here, patched up with latest PSUs.

    ReplyDelete
    Replies
    1. 11.2.0.3.9 here as well. I haven't played with MAXPIECESIZE yet, I'll give it a go when I have a few spare moments.

      Delete
    2. Noons I just kicked off a backup with 8 channels each with maxpiecesize set and it's still only using 1 channel. I don't believe that maxpiecesize means parallelization the way section size implies, it just limits the size of the resulting backuppiece (handy if some OSes don't support large files).

      I'll let my test run but right now it still looks like you need section size if you're talking about parallel backups of a large file. I'm not sure why it wouldn't make a difference in your case, perhaps your big files are not really big enough to realize the time savings like I do with multi-terabyte files.

      I see from the docs as well that you can't use maxpiecesize and section size together.

      Delete
  2. Sorry, I forgot to add that I use "PARALLELISM n". In most cases 4, but in some large dbs, 8. And all 4 or 8 channels (rman server processes) are flat out, that I can assure you!
    They all finish within mins of each other, with the last backup files (4 or 8 depending on parallelism) shorter than 5GB, all others at 5GB regardless of the bigfile size.
    In most cases the bigfiles are 300GB and nearly full so even with compression, there is no way each would fit into a single 5GB backup size!
    I don't think the SECTION SIZE and MAXPIECESIZE behaviour has been fully explained.
    Yes, I am aware that SECTION and MAXPIECESIZE are mutually excusive but I also am not seeing the slightest difference between them time-wise or space-wise, if PARALLELISM is set for the latter. And the large db size (actual data, not allocated disk) is well over 4TB. More than sufficient to cause a distinct behaviour if there was a clear cut difference. I'm also using Aix 7.1 and neither ASM nor RAC, but I can't possibly fathom how those could make a diff?

    ReplyDelete
    Replies
    1. Hmm, my disk device configuration is already set for parallelism 4. Also non-RAC on NFS. Wonder why I only got the one channel.

      Delete
    2. I agree entirely. That's why I'm not sure this subject has been fully explored yet. I think there is more to it than what we are seeing. Let's hope someone with more info casts their eye this way and helps find all the combos that work.

      Delete