Modality? What’s Modality?

Here’s a comment I received from Paul the other day about my database normalization tutorial:

Very nice. Excellent. The only thing I would question would be that a customer could have zero orders. In order to exist as a customer in your system, they must have placed an order at some point. So, the modality should be one.

I always like hearing nice things about my work, but the remark about modality gave me pause. What does that mean? That’s not a word I ever learned at the Database School of Hard Knocks. What is Paul talking about?

Modality, it turns out, is another word for what we database people call cardinality, or “the number of unique things in a set”. You might also hear it referred to as degree amongst erudite set-theory folks. When we talk about one-to-one relationships (or one-to-many, many-to-many, etc.), we’re talking about the modality or cardinality of the relationship between things in different sets. In other words, one row in a table can be related to rows of data in another table. It doesn’t have to be, but it can be, depending on what real-world relationships you are trying to model.

For instance, suppose you have a database of people, and EMAIL_ADDRESS is an optional field in your database. A person might have an email address, or she might not. We say there is a one-to-zero-or-one relationship, because one person can have zero or one email addresses.

If, on the other hand, EMAIL_ADDRESS is not optional, the relationship becomes one-to-one because one person must have one email address.

If, on the third hand, a person can have any number of email addresses, we call that one-to-many, because one person can have many email addresses. If the email address is still optional, we call that one-to-zero-or-many.

Each of these possibilities can be shown in an Entity Relationship Diagram or ERD, which is a visual representation of the relationships in your database. There are different ways of showing the relationships on a diagram, and terrible religious wars have been fought over which system is best. The system I chose for my tutorial uses the following symbols:

  • Every A has exactly one B.
  • Every B has exactly one A.
  • Any A can have no more than one B; maybe none.
  • Any B can have no more than one A; maybe none.
  • Every A has one or more Bs.
  • Any B has no more than one A; maybe none.
One and only one-to-many
  • Every A has one or more Bs.
  • Every B has exactly one A.
One-to-zero or many
  • Any A can have any number of Bs; maybe none.
  • Any B can have no more than one A; maybe none.
  • Every A has one or more Bs.
  • Every B has one or more As.

Notice how the symbols (which are sometimes called “crowsfeet” for reasons that should be obvious) change the cardinality of the relationship. You should understand that this is not an exhaustive list of entity relationships, and again, there are many different sets of symbols out there.

So what’s all this got to do with my tutorial?

Check out Figure J, the diagram representing Third Normal form. Originally I had drawn it like this:

Third Normal Form: incorrect relationship

See if you can spot the problem.

Give up?

The symbols connecting ORDERS with CUSTOMERS say,

Every order has exactly one customer


Every customer can have any number of orders; maybe none.

In other words, a customer could exist in the database without ever placing an order.

I call this a “problem” rather than a “mistake” because there are lots of databases out in the real world that have customers who have never placed an order. Although logically, the definition of “customer” is “someone who has bought something from you”, I can think of all sorts of reasons why you might want to keep track of order-less customers… maybe their order is pending. Or maybe the customer is just a sales prospect, in which case they should perhaps go into a PROSPECTS table, and then migrate to CUSTOMERS after the order is made… or maybe you could add a STATUS field to the table to differentiate between pending customers, prospective customers, and actual customers… The possibilities are delightfully many, as are the possible solutions.

But guess what? I don’t care. At least not in the context of my tutorial. I wanted things to be clean and simple– I just wanted to model a stack of invoices, where every invoice has a customer, and every customer has made a purchase. But by selecting the wrong crowsfoot symbol, I introduced a whole big mess that would need considerable explanation that would dilute the simplicity of the tutorial.

So, what to do?  Easy; just change the symbol connecting CUSTOMERS to ORDERS. Here’s my new Figure J.

Third Normal Form

Which says,

Every order has exactly one customer


Every customer has one or more orders.

I’m grateful to Paul for pointing out this inconsistency.

The whole truth behind candidate keys

A reader of my tutorial on the 3 Normal Forms recently asked:

The question is: why was customer_id not used as part of the composite primary key (order_id, customer_id , item_id) in 1NF in the first place?

It’s a good question. The questioner identified one of the normalization issues that I simplified in order to make (I hope) my tutorial more readable.

Going back to the tutorial, we said that every table row has to have a column or group of columns that uniquely identifies it, right? We called this the Primary Key. Formal database jargon refers to this as a “candidate key”. There could be more than one candidate key in a table; the one that we actually select becomes the Primary Key.

In truth, that’s only part of the story. The candidate key has to have two properties: First is Uniqueness (as we just stated); the second property is called Irreducibility (or Minimality).

Let’s consider some examples of the three columns the questioner mentioned: order_id, customer_id, item_id:

 order_id  customer_id  item_id
125 56 563
125 56 851
125 56 652
126 2 563

Does this set of columns satisfy the Uniqueness property? Yes, it does. Every combination of {order_id, customer_id, item_id} is unique in our table, and this makes sense according to the invoicing rules of our imaginary business: Every item ordered by a customer appears once and only once on an invoice, and invoices are never duplicated (if a customer makes a duplicate order, we issue a new invoice).

The trouble is, it is redundant to say this. We don’t call {order_id, customer_id, item_id} the candidate key because customer_id isn’t required to satisfy the uniqueness requirement; each row is already uniquely identified by {order_id, item_id}. This is what is meant by the Irreducibility property: If a set of columns can be reduced to a smaller set of columns that still satisfies the Uniqueness property, then it does not qualify as a candidate key.

Here’s a somewhat more formal definition of these two properties (quoted from C.J. Date’s An Introduction to Database Systems, 7th Ed., p. 258):

Let K be a set of attributes of relvar R. Then K is a candidate key for R if and only if it possesses both of the following properties:

  1. Uniqueness: No legal value of R ever contains two distinct tuples with the same value for K.
  2. Irreducibility: No proper subset of K has the uniqueness property.

If you’re unfamiliar with database set theory terminology, here relvar (“relation variable” in case you’re curious) means “table” and tuple means “row”. A proper subset is a subset that does not include the whole set itself (in other words if you have a set of things {a,b,c} then {a} is a subset, {a,b} is a subset, {a,b,c} is a subset, and so forth; a proper subset includes {a}, {a,b}, etc., but does not include {a,b,c}).

I don’t think it’s necessary to get into this kind of detail in the tutorial itself, but it’s worthwhile mentioning and I’m glad someone asked.

3 Normal Forms Tutorial update

Many people have, over the years, both in the blog comments and in private messages, objected to my contention in the Normal Forms tutorial that ITEM_PRICE leads to failure of Second Normal Form. I always go back and forth about this, and I generally tell people something like “it depends on the business rules, it’s not a matter for Normalization to solve, I wanted to keep things simple,” etc. Enough people have pointed this out that I finally decided to spell out the problem in the tutorial itself, under the NF2 discussion. Hopefully this will not prove too confusing for beginners, while at the same time keep the advanced readers satisfied!

3 Normal Forms: Spanish Translation

I am happy to announce that my Three Normal Forms tutorial has been translated into Spanish by Roberto Fernandez. You can download it here.

Ancient Treasure

In my apartment building people often leave books they no longer want in the lobby for others to pick up if they want. Usually it’s nothing interesting, but the other day someone left a truly incredible artifact: the Microsoft Excel Version 1.04 manual, from 1986.

This is a true historical document. It’s not the absolutely first version of Excel, but it’s almost the first. It includes the installation media (800KB floppy!) and it’s absolutely pristine. It feels like this thing belongs in a museum.

But the most interesting part is the very last page: a bug report form, for sending “software bugs, documentation errors, or suggested enhancements”. Best of all: Mail the form to Microsoft!

This thing is a treat. Shall we start the bidding at, say, $0.43?



Debugging ORA-01001: invalid cursor

End-to-end performance tuning is something you hear more and more about. I have seen many presentations about how the “modern” DBA has to be intimately concerned with all layers of the application stack and cannot just focus on the database. I’m on board with that, but the reality is, I have rarely seen it in practice. The only time DBAs, developers, and netops folks seem to venture out of their silos is during crisis meetings and in emergency-situation chatrooms.

A case in point was when I was asked to troubleshoot an invalid cursor error on a client’s database. Here’s the helpful documentation Oracle provides for this error:

oerr ora 1001
01001, 00000, "invalid cursor"
// *Cause:
// *Action:

Hm. I guess that this is one of those things you’re just supposed to “know”. :)

Actually, invalid cursor is generally a problem with the application design. Perhaps the code is squandering resources and opening too many cursors. A common solution is to jack up the value of MAXOPENCURSORS. (Note that this is not an Oracle parameter as some people seem to think. It’s precompiler option. Meaning that you set this in a C header file and recompile your application in order to change it. But don’t ask me about this stuff; I’m a DBA, not a developer, remember?)

Well, there was no chance of throwing this problem back on the developers and saying “it’s your problem, fix it.” The application in this system is a black box, the source code is unavailable, and the vendor who wrote the software is long gone. Must be a pretty sweet time for the developers in this shop; they get to spend their days with their feet up or playing foosball. Thus it was up to us DBA-types to come up with a solution. Where to begin?

The error arose when a user of the application tried to change her password through the web interface. Our first thought was of course to look at the full error message, which provided the line in the code where the error arose:

09/20/2009 12:24:13 => User edit failed:
updateUser(): 62004
ORA-01001: invalid cursor
ORA-06512: at "APP_OWNER.SECURITY_CHECK", line 634
ORA-06512: at "APP_OWNER.SECURITY_CHECK", line 105
ORA-04088: error during execution of trigger

The top point of failure (updateUser) was in the compiled app code, and inaccessible to us. But the rest of the stack was in the server-side PL/SQL code. The first of these (SECURITY_CHECK line 634) is a SELECT using CAST and MULTISET functions. What does that tell me? Well, I know that BULK COLLECT is often preferred over MULTISET because it involves fewer context switches between the SQL and PL/SQL engines… but that would be indicative of a CPU issue, and we seem to be dealing with a cursor cache memory issue here.

I have the client’s DBA try to recreate the error in the QA environment, but he can’t. That’s an interesting tidbit, but it’s not definitive because the QA and production environments are not (alas) completely identical. So there could be some configurable database parameter (shared pool? PGA?) that might help with this.

But rather than blindly start tweaking parameters, it’s time to look at what the user’s session is actually doing right up until the point of failure. I have my DBA buddy run the password update while I repeatedly query v$session but it goes by too fast; I am unable to identify his session amidst the regular user chatter (it’s not a terribly busy database, which is why I thought this might work). So I set up a logon trigger to capture user activity in a tracefile:

create or replace trigger enable_sql_trace after logon on database
if( dbms_session.is_role_enabled('SQL_TRACE') )
execute immediate 'alter session set timed_statistics = TRUE';
execute immediate 'alter session set max_dump_file_size = unlimited';
end if;

Then I set up the SQL_TRACE role and assign it to the application user:

create role sql_trace;
grant sql_trace to app_user;

The DBA runs his update, gets the error, but… no new tracefile appears in USER_DUMP_DEST. So I turn off my trigger…

revoke sql_trace from app_user;
alter trigger enable_sql_trace disable;

…and I go back to to looking at v$session. Why was no new session created? I look at the INACTIVE sessions… say, there’s a whole bunch of APP_USER sessions here that have been logged in since… last March! That can only mean… Connection pool!

The DBA said yes, indeed, the app server runs a Resin connection pool in the middle tier. Well, case closed. We scheduled a restart of Resin, all the cursors that the app sessions had been hanging onto were freed, and the problem vanished (for the next 6 months or so, anyway).

My point is, I could have been a much more effective troubleshooter if I had known more about the architecture of the full application stack. Situations like this inspire me to get outside of my database box and ask more questions… but really, this kind of groping in the dark seems standard; I see it all the time.

See Note 1007395.6: Common Causes of ORA-01001 for more background on ORA-01001.

Backup failure… One RMAN’s odyssey

Here’s a cautionary tale about not doing your homework first.

There’s this Oracle 10.2 database on a Windows 2003 server that I manage. I noticed a big hard drive attached to it that was just sitting there unused, so I asked the sysadmin if I could use it for RMAN backups, since they were starting to gradually consume more and more space. He said “sure.”

So I modified the backup script to use this new K: drive, but the next morning discovered the backup had failed with the following error:

RMAN-00571: ==================================================
RMAN-00569: ========= ERROR MESSAGE STACK FOLLOWS ============
RMAN-00571: ==================================================
RMAN-03009: failure of backup command on c1 channel at 07/21/2009 22:46:18
ORA-19502: write error on file 
  blockno 524545 (blocksize=8192)
ORA-27070: async read/write failed
OSD-04016: Error queuing an asynchronous I/O request.
O/S-Error: (OS 112) There is not enough space on the disk.
ORA-19502: write error on file 
  blockno 524161 (blocksize=8192)
ORA-27070: async read/write failed

Strange. I checked the disk; no files were there. Plenty of free space, though; over 400 GB (and the compressed backup size for this database is typically only 8 GB).

I reviewed the backup statistics recorded in v$rman_backup_job_details and found that the backup had failed after writing just 4,099 MB. It was also much slower than usual, running at 0.6 MB/s (usually it runs at 2.3 MB/s).

  (select instance_name from v$instance) || ' ' ||
    (select instance_number from v$instance) instance
  ,to_char(start_time,'yyyy-mm-dd hh24:mi') start_time
  ,to_char(output_bytes/1048576,'999,999,999.9') output_mb
  ,to_char(output_bytes_per_sec/1048576,'999,999.9') mb_S
  ,time_taken_display elapsed
from v$rman_backup_job_details
order by start_time

---------- ----------------- --------- ----- ---------- -------------
prod1 1    2009-07-18 21:00  8,846.5   2.3   01:02:59   COMPLETED
prod1 1    2009-07-19 21:00  8,852.6   2.3   01:04:08   COMPLETED
prod1 1    2009-07-20 21:00  8,921.5   2.3   01:04:51   COMPLETED
prod1 1    2009-07-21 21:00  4,099.0    .6   01:46:07   FAILED

Hmm, that 4,099 MB output size sounds fishy. Isn’t there something about a 4 GB file size limit on some Windows filesystems? How can I find out what filesystem is on that disk?

You could use the Disk Management snap-in (compmgmt.msc) but since GUIs are for sissies I use the DISKPART command-line utility:


Microsoft DiskPart version 5.2.3790.3959
Copyright (C) 1999-2001 Microsoft Corporation.
On computer: ORASRV-1

DISKPART> list disk

Disk ###  Status      Size     Free     Dyn  Gpt
--------  ----------  -------  -------  ---  ---
Disk 0    Online        17 GB  4080 KB
Disk 1    Online        34 GB      0 B
Disk 2    Online        34 GB      0 B
Disk 3    Online       137 GB      0 B
Disk 4    Online       137 GB      0 B
Disk 5    Online        34 GB      0 B
Disk 6    Online        34 GB      0 B
Disk 7    Online       466 GB      0 B
Disk 8    Online       466 GB      0 B

DISKPART> select disk 8

Disk 8 is now the selected disk.

DISKPART> detail disk

WDC WD50 00AAKS-00YGA0 USB Device
Disk ID: D6232DTD
Type   : USB
Bus    : 0
Target : 0
LUN ID : 0

Volume ###  Ltr  Label        Fs     Type        Size     Status     Info
----------  ---  -----------  -----  ----------  -------  ---------  --------
Volume 3     K   HD-W983      FAT32  Partition    466 GB  Healthy

Aha. K: is a FAT32. And the file size limit for FAT32 is 4 GB (or actually, (2^32)-1 bytes, which is 4 GB minus one byte).

OK, that explains why the backup failed. I can workaround that by tweaking my RMAN configuration so that it creates files no bigger than 4 GB-1, right?

RMAN> configure maxsetsize to 4294967295;

But wait… Is that something I really want to do? Take another look at that DISKPART output. The K: drive is USB. Dang. So that’s why the backup went so slow!

OK, that does it then. That K: drive is completely unsuitable for any Oracle-related purpose. If you have Metalink access you can read more about problems using USB drives with Oracle in Note 604494.1: Errors backing up to a USB disk device. Basically the slowness of the USB writes causes Oracle to fill up the memory buffer before it can get emptied, and the operating system winds up killing the Oracle process to protect itself. I should have taken a closer look at this disk before even attempting to use it for the Oracle backup.

The moral of the story: always look a gift horse in the mouth.

What happened to my automatic instance recovery?

Yesterday I had to shutdown a production reporting database (Oracle version so they could add RAM to the server. The shutdown immediate wasn’t going fast enough to suit me, so I logged into a second session and issued a shutdown abort. “No problem,” I thought. I already had permission to kill whatever sessions were running, and SMON will automatically perform instance (crash) recovery and clean things up when I open the database later, right?

A few minutes later I got the green light to restart the database, and instead of a clean start, the startup halted after it got to MOUNT:

Total System Global Area 6442450944 bytes
Fixed Size 2093584 bytes
Variable Size 2030046704 bytes
Database Buffers 4378853376 bytes
Redo Buffers 31457280 bytes
Database mounted.
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1:

Hmm, that’s odd. This is supposed to happen automatically. You get this kind of message when something bad has happened to the online redologs or the datafiles are damaged, and I can feel a cold lump start to form in the pit of my stomach… But no, the redologs are fine.

Well OK then, whatever

SQL> alter database recover datafile 1;

…but then it asks me to recover datafile 2. Obviously, many datafiles are in an inconsistent state. So I issue

SQL> recover database;

Oracle seems happy about that, and in a couple of minutes I am able to open the database.

SQL> alter database open;

Database altered.

But my question was, why did I have to go through this in the first place? Recovery (rolling throught the online redologs) is something that is supposed to happen automatically when you open the database after abnormal termination (a crash or abort). That’s one of the functions of the SMON process. What happened this time?

A clue was found in the alert log, just before I did my shutdown immediate:

Thu Jul 9 14:47:56 2009
Thu Jul 9 14:47:57 2009
Shutting down instance (immediate)

Backup? Why is there a backup taking place? We use RMAN to do backups, and RMAN does not use alter database begin backup, and anyway this is not the backup window.

Ahhh, right. We’re using NetApp’s SnapManager feature in addition to regular RMAN backups. Every hour the NetApp filer takes a snapshot of the Oracle datafiles, and must put them into backup mode first to keep the headers consistent. OK, that explains the BEGIN BACKUP. How is that relevant here?

The answer is that if you do a shutdown abort while any tablespaces are in backup mode, the DBA must manually take those tablespaces out of backup mode before the database will open. In other words, no automatic instance recovery.

I tried this out on my test system, and discovered that in 11g Oracle has improved the error message slightly to help dimwits like me figure out what’s going on:

ORA-10873: file 1 needs end backup before opening a database

…instead of the slightly puzzling

ORA-01113: file 1 needs media recovery

Further explication can be found here:

v$osstat anomaly

The v$osstat view shows certain OS-level metics such as number of CPUs, current system load, accumulated IO time, etc. Starting in 10gR2, it also records PHYSICAL_MEMORY_BYTES, which is the total size of physical memory on the server. See the documentation.

I think I found an anomaly. On one particular database I take care of (version running on RedHat 4 64-bit), PHYSICAL_MEMORY_BYTES appears to be the amount of free memory available on the system, not total memory size like it says in the docs.

On that system, if I run the following two commands from within sqlplus —

SQL> ho cat /proc/meminfo |grep ^MemFree |tr -s ' ' |cut -d' ' -f2
SQL> select value from v$osstat
  where stat_name='PHYSICAL_MEMORY_BYTES';

…the results are really, really close. Less than a few dozen bytes apart. This leads me to believe that on this particular database, v$osstat is getting its value for PHYSICAL_MEMORY_BYTES from free memory, not from total memory.

DBA_HIST_OSSTAT tracks this value over time. It should not change unless you are changing the amount of installed memory on your server. But here is how PHYSICAL_MEMORY_BYTES changes on my RHEL4 64-bit system:

select * from (
  select snap_id, value
    , value-lag(value) over(order by snap_id) delta
  from dba_hist_osstat
  where stat_name='PHYSICAL_MEMORY_BYTES'
  order by snap_id
) where rownum <= 50
   SNAP_ID      VALUE      DELTA
---------- ---------- ----------
     28389    2220048
     28390    2171328     -48720
     28391    1938152    -233176
     28392    2131860     193708
     28393    2123272      -8588
     28394    2097128     -26144
     28395    2070008     -27120
     28396    2068092      -1916
     28397    2076164       8072
     28398    2059132     -17032
     28399    2045440     -13692
     28400    2037288      -8152

A database installed on identical hardware/OS reports no change in PHYSICAL_MEMORY_BYTES:

   SNAP_ID      VALUE      DELTA
---------- ---------- ----------
     29125 8359936000
     29126 8359936000          0
     29127 8359936000          0
     29128 8359936000          0
     29129 8359936000          0
     29130 8359936000          0
     29131 8359936000          0
     29132 8359936000          0
     29133 8359936000          0

…so I am tempted to suppose that this is a 10gR2 bug that was fixed in However, I checked on a database running on Windows, and there is no anomaly:

--------- ---------- ----------
    43747 4093104128
    43748 4093104128          0
    43749 4093104128          0
    43750 4093104128          0
    43751 4093104128          0
    43752 4093104128          0
    43753 4093104128          0
    43754 4093104128          0
    43755 4093104128          0
    43756 4093104128          0
    43757 4093104128          0

It appears that this is either a Linux-specific bug that was fixed in, or perhaps a bug that only occurs in (I don’t have a database running on Windows to test this out on). However, I could find no mention of this as a bug on Metalink.

cursor: pin S wait on X

I had a little excitement with the cursor: pin S wait on X wait event when a client’s website came grinding to a halt.

The first thing I checked was blocking sessions…

col username form a12
col event form a25 word_wrapped
col blocking_session head "BLOCKING|SID"
select sid, username, event, blocking_session,
seconds_in_wait, wait_time
from v$session
where state = 'WAITING'
and wait_class != 'Idle'

…but there were none (i.e. the BLOCKING_SESSION column was empty). There were, however, a large number of sessions engaged in cursor: pin S wait on X waits.
I took down a few SIDs and ran this query to find out what SQL they were executing:

col sql_text form a150 word_wrap
col child_number head CH# for 999
sid,  hash_value,
from v$sql
where hash_value in (select sql_hash_value from v$session where sid in (&sid))

They were all executing the same PL/SQL package. I checked out that package–

select status,
to_char(last_ddl_time, 'yyyy-mm-dd hh24:mi') last_ddl_time
from dba_objects
where upper(owner)=upper('&own')
and upper(object_name) = upper('&name')

…and discovered that it had last been modified 15 minutes ago. At that point the site’s DBA came online and he informed me that he had recompiled the package and then had been called away to an unrelated emergency.

Hmm, rolling out untested code on a production server? Grrr… Well, no time for that now; we had to get the website unstuck. He fixed the code and recompiled it, but this did not release those cursor: pin S wait on X waits.

Instead of killing each session one by one (there were dozens of them), I identified the root blocker, whose SID is embedded in the p2raw column of v$session:

select p2raw, count(*)
from v$session
where event = 'cursor: pin S wait on X'
group by p2raw
P2RAW            COUNT(*)
---------------- --------
0000028100000000 76
000001EF00000000  3

I went for 0000028100000000 since it had the most sessions. The SID of the blocker is embedded in the first 8 bytes of that string– 281 (hexadecimal). Converted to decimal, the SID is 641.We killed that session…

select serial# from v$session where sid=641;
alter system kill session '641,17938';

…and all the cursor: pin S wait on X waits evaporated and the site sprang to life.
So obviously, this was a blocking situation. Why didn’t my original query, the one that checked for blocking sessions, identify any blockers?

It’s because of a bug in Oracle 10g: the v$session.blocking_session column does not get populated in the case of cursor: pin S wait on X waits. This has apparently been fixed in 11g.

There are a number of bugs pertaining to this wait event. They didn’t apply here, since our smoking gun was our own buggy code. Check out Asif Momen‘s blog for further references.