25 April 2014

Don't Fear the EM12c Metric Extensions

A few weeks ago, our customer support team asked us to automate part of their checklist that looks at the number of active sessions in our production database. In EM12c, this seemed like a no-brainer with the Average Active Sessions metric. So I added this to my production incident ruleset and went back to another project. Over the next few days we'd get pinged by EM12c but the support folks would say it shouldn't. After taking a look we realized that we should be looking at USER sessions, excluding the BACKGROUND sessions Oracle creates to run the instance (like DBWn and LGWR).

The trouble was that I couldn't find a metric for just the user sessions. I had resolved myself to having a scheduler job or script run the SQL that I wanted and send an email if it was over my critical threshold. On a whim I put a message out to twitter, and thankfully my friend Leighton answered. He suggested I look at adding a metric extension, something still foreign to me. I had seen the term in the EM12c interface but it sounded like some kind of plugin interface. Turns out it's simply another way of saying "user-defined metrics". Honestly, if they were labelled that way I would have started playing with them much sooner (subtle criticism).

So a quick search turned up a great video that showed just how simple it was to create a metric extension based on a SQL query. In just a few minutes I had the metric extension created, tested and published. You can create many different types of metric extensions all target types, but in my case a simple SQL query for a database instance was all I needed:

select count(*) from v$session
where type='USER' and status='ACTIVE';

I then define the warning and critical thresholds for the count and it's done! I added it to my ruleset (removing the stock Average Active Sessions metric) and haven't looked back.

Since then I've created a few other metric extensions, for example a standby lag check on a standard edition physical standby. My only regret is not taking the time to learn about these sooner. I suggest anyone using EM12c do so sooner rather than later.

17 April 2014

Supplemental Logging and Securefiles Causing DBWn to Block

A few weeks back, we began making changes to prepare for using Oracle Golden Gate. One of the first steps required is to enable "minimal supplemental logging" at the database level. We did this during an evening maintenance window. However by the time the morning workload picked up, we started seeing a lot of sessions blocking, and the root blocker was one of the DB Writer (DBWn) processes.

Looking at the blocked sessions, a query similar to this was a common theme:

UPDATE foo
SET foo_data = :data, foo_time = systimestamp
WHERE foo_id = :id

This statement was run by many sessions from our webservers as part of a page load process. Very high-frequency call rate. We knew that the only change in the database was the minimal supplemental logging. Obviously we were preparing to turn it off, but took some time to look into it. It is important to note that for the most part the contention was only with sessions running this update statement.

15 April 2014

RMAN Redundancy is not a Viable Retention Policy


Originally posted by me on the Pythian blog. This is an older post that I somehow forgot to post on my own blog, but another recent redundancy foul-up reminded me of it.

The story you are about to read is based on actual events. Names and paths have been changed to protect the innocent. I call this scenario “The Perfect Storm” because it took just the right combination of events and configurations. Sadly, this doesn’t make it an unlikely occurrence, so I’m posting it here in hopes that you’ll be able to save yourselves before it’s too late.

I have always had a preternatural dislike for using REDUNDANCY as a retention policy for Oracle RMAN, greatly preferring RECOVERY WINDOW instead, simply because REDUNDANCY doesn’t really guarantee anything valuable to me, whereas RECOVERY WINDOW guarantees that I’ll be able to do a point-in-time recovery to anytime within the past x days. Plus, I had already been burned once by a different client using REDUNDANCY. With the story I’m about to tell, this dislike has turned into violent hatred. I’m going to be light on the technical details, but I hope you’ll still feel the full pain.

07 April 2014

Migrating (and Upgrading!) Your EM12c Repository Database

This week I migrated our EM12c repository database to a new server as part of its promotion to production status. Just to make it a little more exciting, the migration also involved an in-flight upgrade from 11.2.0.3 to 11.2.0.4. Much of this post is directly inspired by Martin Bach's post on the same subject. I ran into a few other snags that weren't mentioned so I thought it would be worthwhile to document the experience here for your benefit.

I'm assuming you have all the software installed (and patched to the latest PSU, right?). Alright then, let's begin!