Backup failure… One RMAN’s odyssey

Here’s a cautionary tale about not doing your homework first.

There’s this Oracle 10.2 database on a Windows 2003 server that I manage. I noticed a big hard drive attached to it that was just sitting there unused, so I asked the sysadmin if I could use it for RMAN backups, since they were starting to gradually consume more and more space. He said “sure.”

So I modified the backup script to use this new K: drive, but the next morning discovered the backup had failed with the following error:

RMAN-00571: ==================================================
RMAN-00569: ========= ERROR MESSAGE STACK FOLLOWS ============
RMAN-00571: ==================================================
RMAN-03009: failure of backup command on c1 channel at 07/21/2009 22:46:18
ORA-19502: write error on file 
  "K:\ORACLE\RMAN\PROD1\PROD1_4364_1_692830812_RMAN.DBF", 
  blockno 524545 (blocksize=8192)
ORA-27070: async read/write failed
OSD-04016: Error queuing an asynchronous I/O request.
O/S-Error: (OS 112) There is not enough space on the disk.
ORA-19502: write error on file 
  "K:\ORACLE\RMAN\PROD1\PROD1_4364_1_692830812_RMAN.DBF", 
  blockno 524161 (blocksize=8192)
ORA-27070: async read/write failed

Strange. I checked the disk; no files were there. Plenty of free space, though; over 400 GB (and the compressed backup size for this database is typically only 8 GB).

I reviewed the backup statistics recorded in v$rman_backup_job_details and found that the backup had failed after writing just 4,099 MB. It was also much slower than usual, running at 0.6 MB/s (usually it runs at 2.3 MB/s).

select
  (select instance_name from v$instance) || ' ' ||
    (select instance_number from v$instance) instance
  ,to_char(start_time,'yyyy-mm-dd hh24:mi') start_time
  ,to_char(output_bytes/1048576,'999,999,999.9') output_mb
  ,to_char(output_bytes_per_sec/1048576,'999,999.9') mb_S
  ,time_taken_display elapsed
  ,status
from v$rman_backup_job_details
order by start_time
/

INSTANCE   START_TIME        OUTPUT_MB MB_S  ELAPSED    STATUS
---------- ----------------- --------- ----- ---------- -------------
prod1 1    2009-07-18 21:00  8,846.5   2.3   01:02:59   COMPLETED
prod1 1    2009-07-19 21:00  8,852.6   2.3   01:04:08   COMPLETED
prod1 1    2009-07-20 21:00  8,921.5   2.3   01:04:51   COMPLETED
prod1 1    2009-07-21 21:00  4,099.0    .6   01:46:07   FAILED

Hmm, that 4,099 MB output size sounds fishy. Isn’t there something about a 4 GB file size limit on some Windows filesystems? How can I find out what filesystem is on that disk?

You could use the Disk Management snap-in (compmgmt.msc) but since GUIs are for sissies I use the DISKPART command-line utility:

d:\oracle\>diskpart

Microsoft DiskPart version 5.2.3790.3959
Copyright (C) 1999-2001 Microsoft Corporation.
On computer: ORASRV-1

DISKPART> list disk

Disk ###  Status      Size     Free     Dyn  Gpt
--------  ----------  -------  -------  ---  ---
Disk 0    Online        17 GB  4080 KB
Disk 1    Online        34 GB      0 B
Disk 2    Online        34 GB      0 B
Disk 3    Online       137 GB      0 B
Disk 4    Online       137 GB      0 B
Disk 5    Online        34 GB      0 B
Disk 6    Online        34 GB      0 B
Disk 7    Online       466 GB      0 B
Disk 8    Online       466 GB      0 B

DISKPART> select disk 8

Disk 8 is now the selected disk.

DISKPART> detail disk

WDC WD50 00AAKS-00YGA0 USB Device
Disk ID: D6232DTD
Type   : USB
Bus    : 0
Target : 0
LUN ID : 0

Volume ###  Ltr  Label        Fs     Type        Size     Status     Info
----------  ---  -----------  -----  ----------  -------  ---------  --------
Volume 3     K   HD-W983      FAT32  Partition    466 GB  Healthy

Aha. K: is a FAT32. And the file size limit for FAT32 is 4 GB (or actually, (2^32)-1 bytes, which is 4 GB minus one byte).

OK, that explains why the backup failed. I can workaround that by tweaking my RMAN configuration so that it creates files no bigger than 4 GB-1, right?

RMAN> configure maxsetsize to 4294967295;

But wait… Is that something I really want to do? Take another look at that DISKPART output. The K: drive is USB. Dang. So that’s why the backup went so slow!

OK, that does it then. That K: drive is completely unsuitable for any Oracle-related purpose. If you have Metalink access you can read more about problems using USB drives with Oracle in Note 604494.1: Errors backing up to a USB disk device. Basically the slowness of the USB writes causes Oracle to fill up the memory buffer before it can get emptied, and the operating system winds up killing the Oracle process to protect itself. I should have taken a closer look at this disk before even attempting to use it for the Oracle backup.

The moral of the story: always look a gift horse in the mouth.

This entry was posted in Databases. Bookmark the permalink.

2 Responses to Backup failure… One RMAN’s odyssey

  1. Thank you so much for this precious information.

  2. Rod says:

    What about mounting the usbdisk using parameter “sync” ?

    Recently I faced this connection problem and using synchronous write solved it.

Comments are closed.