The whole truth behind candidate keys

A reader of my tutorial on the 3 Normal Forms recently asked:

The question is: why was customer_id not used as part of the composite primary key (order_id, customer_id , item_id) in 1NF in the first place?

It’s a good question. The questioner identified one of the normalization issues that I simplified in order to make (I hope) my tutorial more readable.

Going back to the tutorial, we said that every table row has to have a column or group of columns that uniquely identifies it, right? We called this the Primary Key. Formal database jargon refers to this as a “candidate key”. There could be more than one candidate key in a table; the one that we actually select becomes the Primary Key.

In truth, that’s only part of the story. The candidate key has to have two properties: First is Uniqueness (as we just stated); the second property is called Irreducibility (or Minimality).

Let’s consider some examples of the three columns the questioner mentioned: order_id, customer_id, item_id:

 order_id  customer_id  item_id
125 56 563
125 56 851
125 56 652
126 2 563

Does this set of columns satisfy the Uniqueness property? Yes, it does. Every combination of {order_id, customer_id, item_id} is unique in our table, and this makes sense according to the invoicing rules of our imaginary business: Every item ordered by a customer appears once and only once on an invoice, and invoices are never duplicated (if a customer makes a duplicate order, we issue a new invoice).

The trouble is, it is redundant to say this. We don’t call {order_id, customer_id, item_id} the candidate key because customer_id isn’t required to satisfy the uniqueness requirement; each row is already uniquely identified by {order_id, item_id}. This is what is meant by the Irreducibility property: If a set of columns can be reduced to a smaller set of columns that still satisfies the Uniqueness property, then it does not qualify as a candidate key.

Here’s a somewhat more formal definition of these two properties (quoted from C.J. Date’s An Introduction to Database Systems, 7th Ed., p. 258):

Let K be a set of attributes of relvar R. Then K is a candidate key for R if and only if it possesses both of the following properties:

  1. Uniqueness: No legal value of R ever contains two distinct tuples with the same value for K.
  2. Irreducibility: No proper subset of K has the uniqueness property.

If you’re unfamiliar with database set theory terminology, here relvar (“relation variable” in case you’re curious) means “table” and tuple means “row”. A proper subset is a subset that does not include the whole set itself (in other words if you have a set of things {a,b,c} then {a} is a subset, {a,b} is a subset, {a,b,c} is a subset, and so forth; a proper subset includes {a}, {a,b}, etc., but does not include {a,b,c}).

I don’t think it’s necessary to get into this kind of detail in the tutorial itself, but it’s worthwhile mentioning and I’m glad someone asked.

2 Responses to “The whole truth behind candidate keys”


  • can we use customer-id and order-id as composite primary key?

  • Sana, in short, no you may not. Think about if the same customer (e.g. ’56’) ordered the same item (e.g. ‘563’) on a later invoice (i.e. a subsequent unique order). A simple sfw (Select-From-Where) query where ‘customer-id’ and ‘order-id’ were being used as the primary key would yield both invoices where customer ’56’ had ordered item ‘563’. Unfortunately, this primary key would not uniquely identify each tuple (row).

Leave a Reply