Here’s a cautionary tale about not doing your homework first.
There’s this Oracle 10.2 database on a Windows 2003 server that I manage. I noticed a big hard drive attached to it that was just sitting there unused, so I asked the sysadmin if I could use it for RMAN backups, since they were starting to gradually consume more and more space. He said “sure.”
So I modified the backup script to use this new K: drive, but the next morning discovered the backup had failed with the following error:
RMAN-00571: ================================================== RMAN-00569: ========= ERROR MESSAGE STACK FOLLOWS ============ RMAN-00571: ================================================== RMAN-03009: failure of backup command on c1 channel at 07/21/2009 22:46:18 ORA-19502: write error on file "K:\ORACLE\RMAN\PROD1\PROD1_4364_1_692830812_RMAN.DBF", blockno 524545 (blocksize=8192) ORA-27070: async read/write failed OSD-04016: Error queuing an asynchronous I/O request. O/S-Error: (OS 112) There is not enough space on the disk. ORA-19502: write error on file "K:\ORACLE\RMAN\PROD1\PROD1_4364_1_692830812_RMAN.DBF", blockno 524161 (blocksize=8192) ORA-27070: async read/write failed
Strange. I checked the disk; no files were there. Plenty of free space, though; over 400 GB (and the compressed backup size for this database is typically only 8 GB).
I reviewed the backup statistics recorded in v$rman_backup_job_details and found that the backup had failed after writing just 4,099 MB. It was also much slower than usual, running at 0.6 MB/s (usually it runs at 2.3 MB/s).
select (select instance_name from v$instance) || ' ' || (select instance_number from v$instance) instance ,to_char(start_time,'yyyy-mm-dd hh24:mi') start_time ,to_char(output_bytes/1048576,'999,999,999.9') output_mb ,to_char(output_bytes_per_sec/1048576,'999,999.9') mb_S ,time_taken_display elapsed ,status from v$rman_backup_job_details order by start_time / INSTANCE START_TIME OUTPUT_MB MB_S ELAPSED STATUS ---------- ----------------- --------- ----- ---------- ------------- prod1 1 2009-07-18 21:00 8,846.5 2.3 01:02:59 COMPLETED prod1 1 2009-07-19 21:00 8,852.6 2.3 01:04:08 COMPLETED prod1 1 2009-07-20 21:00 8,921.5 2.3 01:04:51 COMPLETED prod1 1 2009-07-21 21:00 4,099.0 .6 01:46:07 FAILED
Hmm, that 4,099 MB output size sounds fishy. Isn’t there something about a 4 GB file size limit on some Windows filesystems? How can I find out what filesystem is on that disk?
You could use the Disk Management snap-in (compmgmt.msc) but since GUIs are for sissies I use the DISKPART command-line utility:
d:\oracle\>diskpart Microsoft DiskPart version 5.2.3790.3959 Copyright (C) 1999-2001 Microsoft Corporation. On computer: ORASRV-1 DISKPART> list disk Disk ### Status Size Free Dyn Gpt -------- ---------- ------- ------- --- --- Disk 0 Online 17 GB 4080 KB Disk 1 Online 34 GB 0 B Disk 2 Online 34 GB 0 B Disk 3 Online 137 GB 0 B Disk 4 Online 137 GB 0 B Disk 5 Online 34 GB 0 B Disk 6 Online 34 GB 0 B Disk 7 Online 466 GB 0 B Disk 8 Online 466 GB 0 B DISKPART> select disk 8 Disk 8 is now the selected disk. DISKPART> detail disk WDC WD50 00AAKS-00YGA0 USB Device Disk ID: D6232DTD Type : USB Bus : 0 Target : 0 LUN ID : 0 Volume ### Ltr Label Fs Type Size Status Info ---------- --- ----------- ----- ---------- ------- --------- -------- Volume 3 K HD-W983 FAT32 Partition 466 GB Healthy
Aha. K: is a FAT32. And the file size limit for FAT32 is 4 GB (or actually, (2^32)-1 bytes, which is 4 GB minus one byte).
OK, that explains why the backup failed. I can workaround that by tweaking my RMAN configuration so that it creates files no bigger than 4 GB-1, right?
RMAN> configure maxsetsize to 4294967295;
But wait… Is that something I really want to do? Take another look at that DISKPART output. The K: drive is USB. Dang. So that’s why the backup went so slow!
OK, that does it then. That K: drive is completely unsuitable for any Oracle-related purpose. If you have Metalink access you can read more about problems using USB drives with Oracle in Note 604494.1: Errors backing up to a USB disk device. Basically the slowness of the USB writes causes Oracle to fill up the memory buffer before it can get emptied, and the operating system winds up killing the Oracle process to protect itself. I should have taken a closer look at this disk before even attempting to use it for the Oracle backup.
The moral of the story: always look a gift horse in the mouth.
Thank you so much for this precious information.
What about mounting the usbdisk using parameter “sync” ?
Recently I faced this connection problem and using synchronous write solved it.