Archive for the 'Computers' Category

Backup failure… One RMAN’s odyssey

Here’s a cautionary tale about not doing your homework first.

There’s this Oracle 10.2 database on a Windows 2003 server that I manage. I noticed a big hard drive attached to it that was just sitting there unused, so I asked the sysadmin if I could use it for RMAN backups, since they were starting to gradually consume more and more space. He said “sure.”

So I modified the backup script to use this new K: drive, but the next morning discovered the backup had failed with the following error:

RMAN-00571: ==================================================
RMAN-00569: ========= ERROR MESSAGE STACK FOLLOWS ============
RMAN-00571: ==================================================
RMAN-03009: failure of backup command on c1 channel at 07/21/2009 22:46:18
ORA-19502: write error on file 
  "K:\ORACLE\RMAN\PROD1\PROD1_4364_1_692830812_RMAN.DBF", 
  blockno 524545 (blocksize=8192)
ORA-27070: async read/write failed
OSD-04016: Error queuing an asynchronous I/O request.
O/S-Error: (OS 112) There is not enough space on the disk.
ORA-19502: write error on file 
  "K:\ORACLE\RMAN\PROD1\PROD1_4364_1_692830812_RMAN.DBF", 
  blockno 524161 (blocksize=8192)
ORA-27070: async read/write failed

Strange. I checked the disk; no files were there. Plenty of free space, though; over 400 GB (and the compressed backup size for this database is typically only 8 GB).

I reviewed the backup statistics recorded in v$rman_backup_job_details and found that the backup had failed after writing just 4,099 MB. It was also much slower than usual, running at 0.6 MB/s (usually it runs at 2.3 MB/s).

select
  (select instance_name from v$instance) || ' ' ||
    (select instance_number from v$instance) instance
  ,to_char(start_time,'yyyy-mm-dd hh24:mi') start_time
  ,to_char(output_bytes/1048576,'999,999,999.9') output_mb
  ,to_char(output_bytes_per_sec/1048576,'999,999.9') mb_S
  ,time_taken_display elapsed
  ,status
from v$rman_backup_job_details
order by start_time
/

INSTANCE   START_TIME        OUTPUT_MB MB_S  ELAPSED    STATUS
---------- ----------------- --------- ----- ---------- -------------
prod1 1    2009-07-18 21:00  8,846.5   2.3   01:02:59   COMPLETED
prod1 1    2009-07-19 21:00  8,852.6   2.3   01:04:08   COMPLETED
prod1 1    2009-07-20 21:00  8,921.5   2.3   01:04:51   COMPLETED
prod1 1    2009-07-21 21:00  4,099.0    .6   01:46:07   FAILED

Hmm, that 4,099 MB output size sounds fishy. Isn’t there something about a 4 GB file size limit on some Windows filesystems? How can I find out what filesystem is on that disk?

You could use the Disk Management snap-in (compmgmt.msc) but since GUIs are for sissies I use the DISKPART command-line utility:

d:\oracle\>diskpart

Microsoft DiskPart version 5.2.3790.3959
Copyright (C) 1999-2001 Microsoft Corporation.
On computer: ORASRV-1

DISKPART> list disk

Disk ###  Status      Size     Free     Dyn  Gpt
--------  ----------  -------  -------  ---  ---
Disk 0    Online        17 GB  4080 KB
Disk 1    Online        34 GB      0 B
Disk 2    Online        34 GB      0 B
Disk 3    Online       137 GB      0 B
Disk 4    Online       137 GB      0 B
Disk 5    Online        34 GB      0 B
Disk 6    Online        34 GB      0 B
Disk 7    Online       466 GB      0 B
Disk 8    Online       466 GB      0 B

DISKPART> select disk 8

Disk 8 is now the selected disk.

DISKPART> detail disk

WDC WD50 00AAKS-00YGA0 USB Device
Disk ID: D6232DTD
Type   : USB
Bus    : 0
Target : 0
LUN ID : 0

Volume ###  Ltr  Label        Fs     Type        Size     Status     Info
----------  ---  -----------  -----  ----------  -------  ---------  --------
Volume 3     K   HD-W983      FAT32  Partition    466 GB  Healthy

Aha. K: is a FAT32. And the file size limit for FAT32 is 4 GB (or actually, (2^32)-1 bytes, which is 4 GB minus one byte).

OK, that explains why the backup failed. I can workaround that by tweaking my RMAN configuration so that it creates files no bigger than 4 GB-1, right?

RMAN> configure maxsetsize to 4294967295;

But wait… Is that something I really want to do? Take another look at that DISKPART output. The K: drive is USB. Dang. So that’s why the backup went so slow!

OK, that does it then. That K: drive is completely unsuitable for any Oracle-related purpose. If you have Metalink access you can read more about problems using USB drives with Oracle in Note 604494.1: Errors backing up to a USB disk device. Basically the slowness of the USB writes causes Oracle to fill up the memory buffer before it can get emptied, and the operating system winds up killing the Oracle process to protect itself. I should have taken a closer look at this disk before even attempting to use it for the Oracle backup.

The moral of the story: always look a gift horse in the mouth.

Testing disk I/O with the O_DIRECT flag

Sometimes I find the need to test disk IO rates, especially in the context of one of those delightful intra-office debates where the DBAs are blaming the lousy hardware for bad database performance and the sysadmins are blaming the lousy database.

You can perform a simple test of disk IO using the dd command. First, create a 1 GB file full of zeros:

dd if=/dev/zero of=1G bs=1024k count=1024

Then scan and time it…

time dd if=1G of=/dev/null bs=1024k count=1024

…and then use the output from that to calculate your MB/s.

This technique is flawed because it does not account for OS-level caching, which can give a sigificant boost to IO. The Linux 2.4 kernel allows programs to disable this caching by setting the O_DIRECT flag. Starting in Oracle 9.2 you can use this feature by setting the FILESYSTEMIO_OPTIONS init parameter to DIRECTIO or SETALL. (Direct IO is a controversial subject in the Linux community but Oracle has built significant functionality around it).

Therefore, if Oracle does not use OS-level caching (i.e. FILESYSTEMIO_OPTIONS is set to DIRECTIO or SETALL) , the results you get from the dd test will not be representative for your Oracle database.

The standard dd command does not provide a way to set the O_DIRECT flag (I think this has changed in RHEL5), but Oracle provides a free replacement for the standard Linux coreutils package, which contains the dd and cp commands. The folks at Oracle so generously provided this replacement package to help us debug Clusterware problems, but it can be used for other tasks as well. The replacement can be found at http://oss.oracle.com/projects/coreutils/files/

Oracle has not updated their “official” coreutils package since RHEL3, but there is an updated unofficial version for RHEL4 here: http://oss.oracle.com/~smushran/.coreutils/RHEL4-i386/

I was told that the RHEL3 version would work fine with Centos 4 (and, presumably, RHEL4), but the package installer complained about missing dependencies when I tried to force-install it, so rather than risk breaking something on a production server I sought a different solution.

I downloaded Robin Miller’s dt utility http://www.scsifaq.org/RMiller_Tools/index.html and this provided O_DIRECT functionality. To use it, simply unzip the archive for your platform and then move the single dt executable to your system path (or execute it directly from the unzip folder; it requires no special installation).

dt allows you to create a big file similar to one created by dd; then it will scan that file and calculate the transfer rate for you.

When you run dt in output mode, it performs both a write and a read test:

$ dt of=1G limit=1g bs=1m dispose=keep flags=direct

Write Statistics:
Total records processed: 1024 @ 1048576 bytes/record (1024.000 Kbytes)
Total bytes transferred: 1073741824 (1048576.000 Kbytes, 1024.000 Mbytes)
Average transfer rates: 29059319 bytes/sec, 28378.241 Kbytes/sec
Number I/O's per second: 27.713
Total passes completed: 0/1
Total errors detected: 0/1
Total elapsed time: 00m36.95s
Total system time: 00m00.19s
Total user time: 00m06.30s

Read Statistics:
Total records processed: 1024 @ 1048576 bytes/record (1024.000 Kbytes)
Total bytes transferred: 1073741824 (1048576.000 Kbytes, 1024.000 Mbytes)
Average transfer rates: 50840048 bytes/sec, 49648.485 Kbytes/sec
Number I/O's per second: 48.485
Total passes completed: 1/1
Total errors detected: 0/1
Total elapsed time: 00m21.12s
Total system time: 00m00.13s
Total user time: 00m10.08s

Total Statistics:
Output device/file name: 1G (device type=regular)
Type of I/O's performed: sequential (forward)
Data pattern read/written: 0x39c39c39
Total records processed: 2048 @ 1048576 bytes/record (1024.000 Kbytes)
Total bytes transferred: 2147483648 (2097152.000 Kbytes, 2048.000 Mbytes)
Average transfer rates: 36980948 bytes/sec, 36114.207 Kbytes/sec
Number I/O's per second: 35.268
Total passes completed: 1/1
Total errors detected: 0/1
Total elapsed time: 00m58.07s
Total system time: 00m00.32s
Total user time: 00m16.38s
Starting time: Fri Nov 14 15:09:12 2008
Ending time: Fri Nov 14 15:10:10 2008

When you run it in input mode, it performs just a read test:

$ dt if=1G limit=1g bs=1m dispose=keep flags=direct

Total Statistics:
Input device/file name: 1G (device type=regular)
Type of I/O's performed: sequential (forward)
Data pattern read: 0x39c39c39
Total records processed: 1024 @ 1048576 bytes/record (1024.000 Kbytes)
Total bytes transferred: 1073741824 (1048576.000 Kbytes, 1024.000 Mbytes)
Average transfer rates: 55290516 bytes/sec, 53994.645 Kbytes/sec
Number I/O's per second: 52.729
Total passes completed: 1/1
Total errors detected: 0/1
Total elapsed time: 00m19.42s
Total system time: 00m00.09s
Total user time: 00m09.30s
Starting time: Fri Nov 14 15:10:38 2008
Ending time: Fri Nov 14 15:10:57 2008

Try running it without the flags=direct argument, and you will notice a significant difference in these stats.

See also:

Renaming your Windows XP user account

Someone recently asked me how to rename their C:\Documents and Settings\mysername folder but they didn’t leave a correct email address so my reply bounced. So here, for the benefit of all, is my answer.

Under the Documents and Settings is a folder saved in my name instead of my sons. When I had the new hard drive installed and XP the tech used my name instead of my son. Since it is a system folder I cant change it and really don’t want to pay another bill for trying to. Any other option?

Tosca

Tosca,

As far as I know, you cannot change the name of that folder since it is linked to your user name, and that cannot be changed. (You can change the user name alias, which appears on the login screen, but that does not change the name of the underlying folder in “Documents and Settings”).

You can, however, create a new account, transfer all your settings from the old account profile to the new one, and then delete the old one.

In detail:

Let’s say you have an account named “dad” and you want to change it to “son”.

Go to Control Panel, User accounts, Create a new account.

Type in the account name (“son”), click Next, make the account type “administrator”.

Restart windows and log in using the “son” account you just created.

Right-click on the “My Computer” icon on the desktop, choose Properties, and click the Advanced tab.

Under “User Profiles”, click Settings.

Select the “dad” profile from the list, then click “Copy to”.

Click the Browse button, and browse to the location of your new user profile (it should be in C:\Documents and Settings\son

Click OK, Yes (when it asks if you want to delete the contents of the directory), then OK out of all the dialog boxes.

Restart the computer and log in as “son” again.

Finally, you can delete the “dad” account from the Control Panel, User Accounts.
Hope this saves you a trip to the tech guy!

E-Z Putty Tunnelling

Putty is a great if you only have a handful of servers that you connect to on a regular basis. But as a consultant with a growing list of clients you can rapidly find yourself having to deal with a larger and larger number of servers, and scrolling through that little list of Putty saved sessions can become a nightmare.

putty3.jpg

Then there’s tunnelling. It’s easy to set up tunnels in Putty, so that you can connect to a gateway and then use that connection to jump to a target server in the client’s network, but connecting is always a 2-step process. You have to hunt for and double-click on the tunnel session, and THEN double-click on the connection for the server you actually want to work on. If you’re using Linux or Cygwin you could write a shell script that would streamline things, but under Windows you’re stuck with a whole mess of clicking.

And here’s another annoyance: suppose you spend hours setting up all your Putty profiles, and then you need to transfer those settings to another computer. It’s possible, but it involves dumping and importing registry keys.

Well, life is too short. Here’s a technique for bypassing these putative drawbacks of Putty. It assumes you

  1. have Putty and Plink installed; and
  2. are running Windows.

I assume you have a basic understanding of what ssh tunnelling is and know how to set it up using the Putty GUI. There is a ton of HOWTO’s out there on the subject so I won’t belabour the point here.

No brain surgery is involved. We just write a Windows batch file for every connection we want to make. Then we can categorize these batch files however we like in the filesystem, make shortcuts on the toolbar, the desktop, whatever. The syntax is simple, and you only have to double-click once per connection.

Let’s suppose we want to access a corporate LAN via a gateway:
gateway.foobar.tv

and behind that gateway, there are three servers (not visible to the Internet) that we regularly work on:
dev.foobar.tv
prod.foobar.tv
qa.foobar.tv

First, for this set of servers, write a batch file that defines environment variables that are common to all the connections in the group:

localenv.bat

@set GATEWAY_USER=fred
@set PROD_PORT=3001
@set DEV_PORT=3002
@set QA_PORT=3003

The @ simply prevents these statements from echoing to the screen when they are executed.

The *_PORT numbers are of your own choosing; they are used to bind ports on your own computer to the different tunnels. To get an idea of what ports are currently in use on your machine, do netstat /ar at a command prompt.

Next, write a batch file that will launch both the tunnel and the target server connection together:

prod.bat
(line numbers added for reference only)

[1] @CALL localenv.bat
[2] start “Prod” /b “C:\Program Files\PuTTY\plink.exe” -ssh -l %GATEWAY_USER% -L %PROD_PORT%:prod.foobar.tv:22 -P 22 gateway.foobar.tv
[3] @ping 127.0.0.1 -n 3 -w 1000 > nul
[4] “C:\Program Files\PuTTY\putty.exe” -ssh -l fred -P %PROD_PORT% localhost

Line 1 loads the environment variables.
Line 2 launches a Plink connection in a separate window. Why use Plink? You could have used Putty, but Plink just provides a plain vanilla DOS window that maintains the tunnel whereas Putty is a pleasing interface that you can copy/paste in and customize to suit you. Notice the “start” command, which launches Plink in its own process (kind of like appending “&” to the end of a BASH command).
Line 3 is our way of getting around the fact that the Windows batch language does not by default provide a “wait” or “sleep” command. There are numerous ways to get around this limitation… This line basically waits for 3 seconds while the Plink window sets up the tunnel. You may have to adjust this wait time depending on network conditions.
Line 4 creates your main Putty window.

I trust it is clear from this example how to create dev.bat and qa.bat as well.

Another annoyance is the server will frequently boot you off if you are idle for too long. While there is no command line switch for sending keepalives, you can do this: Set up a new session using the Putty GUI. Go to the Connection section and enter, say, “300” at “Seconds between keepalives”. Don’t fill in anything else, just save the session, give it a name, and then add it to Line 4 of your batch file like this:

[4] “C:\Program Files\PuTTY\putty.exe” -load “default_settings” -ssh -l fred -P %PROD_PORT% localhost

This will send a packet to the server every 5 minutes and (hopefully) keep your idle session active.putty2.jpg

So there you have it. A convenient way to manage tunnel connections with Putty that is easily transportable from one computer to another.

3 Normal Forms

At long last, and after repeated fan requests, I have gotten around to converting my 3 Normal Forms tutorial into a printable PDF, which can be downloaded here.

This tutorial, targeted at beginners who are trying to learn the basics without the complexities of advanced database theory, has received rave reviews from such exotic locales as India, Sweden and Bhutan. The tutorial is free; all I ask for in return is the occasional kind word and soggy crust of bread.

If you don’t know or care anything about database normalization, my tutorial still has value as a soporific.

Normalization Tutorial

Back in the olden days

My first contact with computers came in 1977 or ’78 when I was in grade 5 or 6. Because I was smart, I got to spend a half-day a week in the computer lab at the University of Alberta with a half-dozen or so other “gifted” kids. I think about that experience a lot and wonder how it colored my later love/hate relationship with computers.

The computer lab consisted of dumb terminals wired into an Amdahl mainframe. I don’t remember the model number, but the U. of A. must have been on the bleeding edge of new technology at the time since I see that Amdahl came out with its first mainframe in 1975. I think it was in the Education building, but not sure. We got little pink cards for each program we worked with that contained our login name and the program name (I don’t think we had passwords). One program I remember working with was FUNDP, “Fundamentals of Data Processing”, which was a BASIC-like program. Maybe it was BASIC. I can find no Googlable reference to FUNDP, so I’m starting to think that FUNDP was just the course name, though there was no formal classroom instruction; we all went at our own pace. FUNDP (pronounced “fun dip”) permitted us to write scripts, take user input, and do some very primitive graphical manipulations (like place text in various places around the screen; this permitted one to create crude bitmap images with asterisks).

Then there was FRAND, a French language learning program. It was quite a fantastic multi-media experience, for its day. Before logging in you would take a reel of still images and an audio tape (it might have been reel-to-reel since cassette tapes were not in common usage yet) from a cabinet on the wall. Then you would sit down at one of the special multi-media workstations which were equipped with a TV-sized projection screen (apart from the CRT monitor itself) and an audio tape player. You would queue the image reel into the projector and the audio tape into its player, and then log into the FRAND program on your terminal. The projection screen would show an image, the audio tape would ask you a question, and you would register your answer on the terminal screen with a light pen. The whole mess had to be queued properly and started at the same time or it would be all out of sync, but once started I think that signals from the CPU would auto-advance the film and audio based on the user’s interaction with the program.

The conceit of this program was that you didn’t need to know any French, and that it would use only French to get you to learn French. So the question would be, “Qu’est-ce que c’est?”, you would see an image of a question mark, followed by an image of a grapefruit, and then the audio, “C’est une pamplemousse.” And in this way you were to learn how to say “grapefruit” in French.

I guess the people who wrote the program thought this was pretty clever, but I didn’t have a freakin clue what was going on. Getting the machine to work was an easy thing for me, but the content was beyond my reach: I couldn’t grasp that the machine was showing me a picture, asking me what it was, and then telling me what it was. My inability to grasp this simple device has made me very skeptical of science fiction stories where alien races are able to learn to communicate easily. If I was stuck in a cave with a sentient slime-creature from Zebulon Prime I would probably just eat it before learning how to communicate with it. So much for being “smart”.

It has also made me sympathetic of older people (and Mac users) who are unable to grasp the simplest computer-related concepts that I take for granted, like the difference between core and physical storage, or the meaning of instructions like “right-click on the Start button”, or the difference between “double-click on icons” but “single-click on buttons”. The makers of Windows think these things are obvious, and perhaps they are obvious to me and indeed most computer users these days, but I can understand how some people can (and do) have a blockage in understanding such things.