What happened to my automatic instance recovery?

Yesterday I had to shutdown a production reporting database (Oracle version 10.2.0.4) so they could add RAM to the server. The shutdown immediate wasn’t going fast enough to suit me, so I logged into a second session and issued a shutdown abort. “No problem,” I thought. I already had permission to kill whatever sessions were running, and SMON will automatically perform instance (crash) recovery and clean things up when I open the database later, right?

A few minutes later I got the green light to restart the database, and instead of a clean start, the startup halted after it got to MOUNT:

Total System Global Area 6442450944 bytes
Fixed Size 2093584 bytes
Variable Size 2030046704 bytes
Database Buffers 4378853376 bytes
Redo Buffers 31457280 bytes
Database mounted.
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1:
'/u02/oradata/PROD/foo/DATA/system01.dbf'

Hmm, that’s odd. This is supposed to happen automatically. You get this kind of message when something bad has happened to the online redologs or the datafiles are damaged, and I can feel a cold lump start to form in the pit of my stomach… But no, the redologs are fine.

Well OK then, whatever

SQL> alter database recover datafile 1;

…but then it asks me to recover datafile 2. Obviously, many datafiles are in an inconsistent state. So I issue

SQL> recover database;

Oracle seems happy about that, and in a couple of minutes I am able to open the database.

SQL> alter database open;

Database altered.

But my question was, why did I have to go through this in the first place? Recovery (rolling throught the online redologs) is something that is supposed to happen automatically when you open the database after abnormal termination (a crash or abort). That’s one of the functions of the SMON process. What happened this time?

A clue was found in the alert log, just before I did my shutdown immediate:

Thu Jul 9 14:47:56 2009
Completed: ALTER DATABASE BEGIN BACKUP
Thu Jul 9 14:47:57 2009
Shutting down instance (immediate)

Backup? Why is there a backup taking place? We use RMAN to do backups, and RMAN does not use alter database begin backup, and anyway this is not the backup window.

Ahhh, right. We’re using NetApp’s SnapManager feature in addition to regular RMAN backups. Every hour the NetApp filer takes a snapshot of the Oracle datafiles, and must put them into backup mode first to keep the headers consistent. OK, that explains the BEGIN BACKUP. How is that relevant here?

The answer is that if you do a shutdown abort while any tablespaces are in backup mode, the DBA must manually take those tablespaces out of backup mode before the database will open. In other words, no automatic instance recovery.

I tried this out on my test system, and discovered that in 11g Oracle has improved the error message slightly to help dimwits like me figure out what’s going on:

ORA-10873: file 1 needs end backup before opening a database

…instead of the slightly puzzling

ORA-01113: file 1 needs media recovery

Further explication can be found here:

http://download.oracle.com/docs/cd/B19306_01/backup.102/b14191/osbackup.htm#i1006393

This entry was posted in Databases. Bookmark the permalink.