Here’s a comment I received from Paul the other day about my database normalization tutorial:
Very nice. Excellent. The only thing I would question would be that a customer could have zero orders. In order to exist as a customer in your system, they must have placed an order at some point. So, the modality should be one.
I always like hearing nice things about my work, but the remark about modality gave me pause. What does that mean? That’s not a word I ever learned at the Database School of Hard Knocks. What is Paul talking about?
Modality, it turns out, is another word for what we database people call cardinality, or “the number of unique things in a set”. You might also hear it referred to as degree amongst erudite set-theory folks. When we talk about one-to-one relationships (or one-to-many, many-to-many, etc.), we’re talking about the modality or cardinality of the relationship between things in different sets. In other words, one row in a table can be related to rows of data in another table. It doesn’t have to be, but it can be, depending on what real-world relationships you are trying to model.
For instance, suppose you have a database of people, and EMAIL_ADDRESS is an optional field in your database. A person might have an email address, or she might not. We say there is a one-to-zero-or-one relationship, because one person can have zero or one email addresses.
If, on the other hand, EMAIL_ADDRESS is not optional, the relationship becomes one-to-one because one person must have one email address.
If, on the third hand, a person can have any number of email addresses, we call that one-to-many, because one person can have many email addresses. If the email address is still optional, we call that one-to-zero-or-many.
Each of these possibilities can be shown in an Entity Relationship Diagram or ERD, which is a visual representation of the relationships in your database. There are different ways of showing the relationships on a diagram, and terrible religious wars have been fought over which system is best. The system I chose for my tutorial uses the following symbols:
Notice how the symbols (which are sometimes called “crowsfeet” for reasons that should be obvious) change the cardinality of the relationship. You should understand that this is not an exhaustive list of entity relationships, and again, there are many different sets of symbols out there.
So what’s all this got to do with my tutorial?
Check out Figure J, the diagram representing Third Normal form. Originally I had drawn it like this:
See if you can spot the problem.
The symbols connecting ORDERS with CUSTOMERS say,
Every order has exactly one customer
Every customer can have any number of orders; maybe none.
In other words, a customer could exist in the database without ever placing an order.
I call this a “problem” rather than a “mistake” because there are lots of databases out in the real world that have customers who have never placed an order. Although logically, the definition of “customer” is “someone who has bought something from you”, I can think of all sorts of reasons why you might want to keep track of order-less customers… maybe their order is pending. Or maybe the customer is just a sales prospect, in which case they should perhaps go into a PROSPECTS table, and then migrate to CUSTOMERS after the order is made… or maybe you could add a STATUS field to the table to differentiate between pending customers, prospective customers, and actual customers… The possibilities are delightfully many, as are the possible solutions.
But guess what? I don’t care. At least not in the context of my tutorial. I wanted things to be clean and simple– I just wanted to model a stack of invoices, where every invoice has a customer, and every customer has made a purchase. But by selecting the wrong crowsfoot symbol, I introduced a whole big mess that would need considerable explanation that would dilute the simplicity of the tutorial.
So, what to do? Easy; just change the symbol connecting CUSTOMERS to ORDERS. Here’s my new Figure J.
Every order has exactly one customer
Every customer has one or more orders.
I’m grateful to Paul for pointing out this inconsistency.