At work, we have a big Sun Cluster with 3 major business critical Oracle databases. Today I had to perform a short maintenance on one of those DBs.
I switched off the resource group… Switched on… And a ping-pong game started between the nodes… 🙁 After x attempts, the resource group was marked “offline”. In the logs, I read:
Jun 20 23:06:28 milou SC[SUNWscor.oracle_server.start]:xxxx-rg:xxxx-db-res: [ID 735336 user.error] Media error encountered, but Auto_end_bkp is disabled.
In fact, my Oracle knowledge is very very low (I’m not a DBA) but Sunsolve gave me the following info:
Auto_End_Bkp (Boolean)
A feature that performs the following recovery actions in the event of an interrupted Oracle RDBMS hot backup:
* Recognizes when a datafile fails to open because of files left in hot backup mode. This verification process occurs when Sun Cluster HA for Oracle starts.
* Identifies and releases all files left in hot backup mode.
* Opens the database for use.
You can turn this feature ON and OFF. The default state is OFF (False).
Default: False
Range: None
Tunable: Any time
There was indeed a backup running when the DB was set offline… After more searches on Sunsolve, I found the following document: Sun[TM] Cluster 3.1: Error message “Media error encountered, but Auto_end_bkp disable” (Available only to Sunsolve registered users) which describes a procedure to solve this issue.
The magic command was:
# scrgadm -c -j xxxx-db-res -x Auto_End_Bkp=true
Then, I normal restart do the rest:
# scswitch -Z -g xxxx-rg
Conclusion: Before working on Oracle resource groups, always check that no backup is running!