19 December 2007

A Sub-Prime I/O Primer

My fans on oracle-l already know that I've had a bit of a battle with Oracle I/O recently, most of the damage being self-inflicted. I'd like to give as forensic a review as possible so that those poor souls who come after me will have some hope and inspiration to carry on.

First of all, some definitions, as they relate to Oracle:

Direct I/O: This is I/O done without the use of the OS (or filesystem) buffer cache. Oracle already has its own buffer cache, so the filesystem cache is (usually) makes I/O take much longer than just bypassing it and relying solely on Oracle to handle the data buffer cache, which it should do much better (for it's own needs) than the filesystem.

Asynchronous I/O: This type of I/O means that I/O commands are sent but other processing can continue before the I/O has finished.

NOTE: These two kinds of I/O, direct and asynchronous, are completely independent of each other. For some reason that I can't explain, I had it stuck in my Coke-addled mind that direct was the opposite of asynch I/O. As many times as I read over guides and glossaries, it wouldn't shake loose.

And now, for our feature presentation.

Act I
In which our hero becomes bewildered with the possibilities
I had been idling in #oracle when hali mentioned something to someone about direct I/O. Having not dealt with I/O very much in my own travels, I inquired more. It was then that I learned of filesystemio_options and the benefits of setting them to "directio."
Needless to say, I was sold.
I set about with my trusty system/storage administator on mounting our vxfs partitions with the proper parameters (mincache=direct,convosync=direct) and setting filesystemio_options=directio, which I'm led to believe is redundant. Once the filesystem is mounted with the directio options, Oracle (and anything else) will always use direct I/O.

Act II
In which direct meets asynchronous
Fast forward three or four weeks and I'm grasping at straws trying to figure out a problem that causes the production instance to hang. (I'm not going to address these hangs in this post, as I'm not really sure that they have to do with direct I/O.) There seemed to be a perfect storm of changes in the weeks prior to these problems happening

  1. Migrated to a 64-bit server (via datapump exp/imp)
  2. Much larger SGA (from 1.5 Gb to 16 Gb)
  3. Dramatically different datafile layout
  4. Using direct I/O
And that's just to name the ones I can think of at this moment. I decided (after weeks of tinkering elsewhere) to look at direct I/O. It was during this second look that I realized that one probably should also have asynchronous I/O enabled.

Act III
Paradise Lost
And so I set filesystemio_options=setall. There were no problems upon instance restart. However this showed up in that evenings RMAN backup:
Starting Control File and SPFILE Autobackup at 29-NOV-07
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================

RMAN-03009: failure of Control File and SPFILE Autobackup command on ORA_DISK_1 channel at 11/29/2007 02:00:46 ORA-19502: write error on file "/rman/c-3171457975-20071129-00", blockno 321 (blocksize=16384) ORA-27061: waiting for async I/Os failed Linux-x86_64 Error: 14: Bad address Additional information: -1 Additional information: 1048576
I was seeing similar ORA errors about async I/O in my alert log intermittently when redo logs were being archived. Obviously this wasn't good. Google turned up one reference, which proved to be gold. The VxFS parameter discovered_direct_iosz was set to a default of 256k. We bumped this up to 1024k and the problems disappeared, AS IF BY MAGIC! Greg Rahn also suggested setting max_direct_iosz to 1024k, as well as setting vxio:vol_maxio=2048 to allow 1MB max I/Os.

Greg was also under the impression that VxFS without ODM would not perform asynchronous I/O. He suggested I run an "strace -c" on the LGWR pid, saying that "if you see io_submit and io_getevents in the syscall column, it is using async io on Linux. If you see pwrite64 it is not," and I was happy to see that we were indeed.
oracle:~/sr $ strace -c -p 12605
Process 12605 attached - interrupt to quit
Process 12605 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
80.66    1.142811         298      3831           io_getevents
7.26    0.102820          51      2004           io_submit 
Act IV
Jumping to Conclusions
As I mentioned before, I'm not convinced at all that the hangings had anything to do with the I/O issues. However I must note that since we enabled asynchronous I/O on top of the direct I/O, we haven't had any instance hanging. Where we used to get them once every other day, it has been three weeks since the last incident.

To their end, Oracle support had analyzed our system state dumps and noted that our session cached cursors was at 100% utilization, and a lot of the wait events during the hanging had to do with cursors (cursor: pin S wait on X). Per their recommendation, we have raised the session_cached_cursors parameter, but haven't had a window yet in which to bounce the instance so the parameter can take effect.

If there ever is a point where we figure out exactly what was causing the problem and what the solution was, I'll be sure to write about it in this space. For now I just wanted to show you what an impulsive, ignorant fool I am.
~ Fin ~

01 December 2007

DBWR and the Devil

I was recently re-reading parts I, II, III and IV of Kevin Closson's series on the perils of overconfiguring your db_writer_processes (which I mentioned earlier this year). My instance appeared to be a perfect candidate. It was configured with db_writer_processes=4, using the one-DBWR-for-each-CPU approach that Kevin rails against for most configurations.

After some other reading and investigation, I decided to make the change to only use one DBWR processes. Changes to the db_writer_processes parameter require an instance bounce to take effect, and we had some downtime scheduled anyway.

I'm happy to say that there are no ill effects. I don't the see DBWn process anywhere on the top(1) radar. I can't say I haven't seen any drastic performance improvements, but if I can do the same work at the same level with 1/4 of the resources, I'd call that a win. It's probably a bigger win for our development/test server that had 3 production clones, each with 4 DBWR processes, for a total of 12 DBWRs. This is now 1 each for 3 total, and (again) no reported performance hit.