17 April 2014

Supplemental Logging and Securefiles Causing DBWn to Block

A few weeks back, we began making changes to prepare for using Oracle Golden Gate. One of the first steps required is to enable "minimal supplemental logging" at the database level. We did this during an evening maintenance window. However by the time the morning workload picked up, we started seeing a lot of sessions blocking, and the root blocker was one of the DB Writer (DBWn) processes.

Looking at the blocked sessions, a query similar to this was a common theme:

SET foo_data = :data, foo_time = systimestamp
WHERE foo_id = :id

This statement was run by many sessions from our webservers as part of a page load process. Very high-frequency call rate. We knew that the only change in the database was the minimal supplemental logging. Obviously we were preparing to turn it off, but took some time to look into it. It is important to note that for the most part the contention was only with sessions running this update statement.

15 April 2014

RMAN Redundancy is not a Viable Retention Policy

Originally posted by me on the Pythian blog. This is an older post that I somehow forgot to post on my own blog, but another recent redundancy foul-up reminded me of it.

The story you are about to read is based on actual events. Names and paths have been changed to protect the innocent. I call this scenario “The Perfect Storm” because it took just the right combination of events and configurations. Sadly, this doesn’t make it an unlikely occurrence, so I’m posting it here in hopes that you’ll be able to save yourselves before it’s too late.

I have always had a preternatural dislike for using REDUNDANCY as a retention policy for Oracle RMAN, greatly preferring RECOVERY WINDOW instead, simply because REDUNDANCY doesn’t really guarantee anything valuable to me, whereas RECOVERY WINDOW guarantees that I’ll be able to do a point-in-time recovery to anytime within the past x days. Plus, I had already been burned once by a different client using REDUNDANCY. With the story I’m about to tell, this dislike has turned into violent hatred. I’m going to be light on the technical details, but I hope you’ll still feel the full pain.

07 April 2014

Migrating (and Upgrading!) Your EM12c Repository Database

This week I migrated our EM12c repository database to a new server as part of its promotion to production status. Just to make it a little more exciting, the migration also involved an in-flight upgrade from to Much of this post is directly inspired by Martin Bach's post on the same subject. I ran into a few other snags that weren't mentioned so I thought it would be worthwhile to document the experience here for your benefit.

I'm assuming you have all the software installed (and patched to the latest PSU, right?). Alright then, let's begin!

30 March 2014

ORA-00600 [kkzlpllg:5] When Dropping MView Log

This week a co-worker and I have been doing some rapid-fire testing to improve fast-refresh performance on an old materialized view, which does some summary aggregations on a 1.9 billion row (and growing) master table. One of the things we tested was using the new-in-11gR2 COMMIT SCN feature. There is a great blog post describing the benefits of this feature by Alberto Dell'Era. To quickly summarize and over-simplify, it provides a much faster way to update rows in the materialized view log that are eligible for refresh and purging. This definitely sounds like something we'd want, so let's roll!

Well we quickly hit a snag when testing our creation script the second time around, when it wouldn't let us drop the materialized view log:
ERROR at line 1:
ORA-00600: internal error code, arguments: [kkzlpllg:5], [], [], [], [], [],[], [], [], [], [], []

We found that we also could no longer perform fast refreshes of the materialized view, getting the same ORA-00600 error. Our initial MOS search turned up Doc ID 14158012.8, which indicates Bug 14158012 (Orphan rows in SNAP_LOGDEP$ causes ORA-600 [kkzlpllg:5]). The bug description is:
With this bug, when there are orphan rows in SNAP_LOGDEP$ with RSCN=NULL,
a CREATE MATERIALIZED VIEW or DROP MATERIALIZED VIEW statement will report ORA-600 [kkzlpllg:5].

26 March 2014

Battling Bigfile Backup Bottlenecks

Last Friday I kicked off a database backup to an NFS destination, using the standard "backup as compressed backupset database" syntax. Loyal readers of this blog may recall that I'm the proud custodian of a 25 Tb database, so this backup normally takes a few days, with an expected completion on Monday morning. However it was still running on Wednesday, and reviewing the logs I saw that there was just 1 channel (of the original 8) still running. The backup file that this channel was writing happened to include our largest bigfile datafile, which weighs in at nearly 8 Tb. Reviewing my new backup script I realized that I had neglected to specify a SECTION SIZE parameter. An example of its usage is:

RMAN> backup as compressed backupset
2> section size 64G
3> database;

Without it, RMAN has decided to create a backup piece that bundled my 8 Tb datafile with a few others and then write it out to disk on one channel. Obviously this isn't what we wanted.

16 March 2014

Why Is My MView Log Not Purging?

A few weeks ago we saw one of our tablespaces growing at a rate much higher than the others. Taking a look we saw that the biggest users of space were two materialized view logs, one being 110 Gb and the other 60 Gb. These logs were in place to facilitate the fast refresh of two materialized views, one for each log/table. These materialized views did some aggregations (sum) throughout the day on some important base table data. The fast refreshes were completing successfully many times a day, but the logs were not being purged as expected.

In our case, there was only one mview performing a fast refresh on those base tables, so the mview logs should have been completely purged after each refresh. They certainly shouldn't be growing to over 100+ Gb. Looking at the data in the mview log, all records had a SNAPTIME$$ value of "4000/01/01 00:00:00", which is the default value for records in the mview log that have not been refreshed. Once they are refreshed, the SNAPTIME$$ value gets set to SYSDATE and can then be evaluated for purging.

But why was this value not being updated after refresh?

12 March 2014

What to Expect When You're Changing the DB_UNIQUE_NAME

I recently had to change the db_unique_name of a database to make it jive with our typical database/DataGuard naming policy of appending the datacenter location. For the sake of this post let's say it was changed from ORCL to ORCL_NYC, since this database is in our fictional New York City datacenter.

I did a quick set of tests and thought I'd share the findings to save anyone any unpleasant surprises. Here are the things to expect when changing DB_UNIQUE_NAME.

Change the Parameter Value

First we obviously have to change the value. How we do so is important.
The command:
alter system set db_unique_name=orcl_nyc scope=spfile;

will result in a db_unique_name of ORCL_NYC, in all uppercase, which is used for the path changes we'll discuss later. However using quotes instead:
alter system set db_unique_name='orcl_nyc' scope=spfile;

will result in a lowercase orcl_nyc value used in the parameter and some paths. In either case, the fun begins when you next restart the instance!