Oracle + SunCluster + Backup = Danger

At work, we have a big Sun Cluster with 3 major business critical Oracle databases. Today I had to perform a short maintenance on one of those DBs.

I switched off the resource group… Switched on… And a ping-pong game started between the nodes… 🙁 After x attempts, the resource group was marked “offline”. In the logs, I read:

Jun 20 23:06:28 milou SC[SUNWscor.oracle_server.start]:xxxx-rg:xxxx-db-res: [ID 735336
user.error] Media error encountered, but Auto_end_bkp is disabled.

In fact, my Oracle knowledge is very very low (I’m not a DBA) but Sunsolve gave me the following info:

Auto_End_Bkp (Boolean)
A feature that performs the following recovery actions in the event of an interrupted Oracle RDBMS hot backup:
* Recognizes when a datafile fails to open because of files left in hot backup mode. This verification process occurs when Sun Cluster HA for Oracle starts.
* Identifies and releases all files left in hot backup mode.
* Opens the database for use.
You can turn this feature ON and OFF. The default state is OFF (False).
Default: False
Range: None
Tunable: Any time

There was indeed a backup running when the DB was set offline… After more searches on Sunsolve, I found the following document: Sun[TM] Cluster 3.1: Error message “Media error encountered, but Auto_end_bkp disable” (Available only to Sunsolve registered users) which describes a procedure to solve this issue.

The magic command was:

# scrgadm -c -j xxxx-db-res -x Auto_End_Bkp=true

Then, I normal restart do the rest:

# scswitch -Z -g xxxx-rg

Conclusion: Before working on Oracle resource groups, always check that no backup is running!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.