August 31, 2012

Database Normalization: What to test for Third Normal Form?

In the last post, you saw the tests (based on the candidate key) that should be executed to check the second normal form (2NF). In this post, let us understand the third normal form (3NF) and the tests that should be executed to check it. View my video on Third Normal Form explained with examples or read on...

First, what is the 3NF? Just to recall, the purpose of normalization is to eliminate insertion, update and deletion anomalies. The tables in a normalized database are intuitive in design. They do not require extra query logic or application logic to query or filter the required data. Now, a table that is in 3NF is already in 2NF. Also, each non-key column depends on the candidate key and nothing else.
Now, let us understand why the following examples are not in 3NF and how to convert them to 3NF?

a. TokenAllocation (Token, Date, CustomerName, CustomerAddress)
This table stores data for tokens given out to customers after they walk in a place with a queue (e.g. bank branch, hospital OPD, ISP customer care center) and wait for their turn to be seen. The tokens always start from 1 each morning. The candidate key in this table is {Token, Date} because this pair is unique in each row. A specific token allocated on a particular date is associated with exactly one customer. So, CustomerName depends on the key. Likewise, a specific token on a particular date is associated with exactly one customer address. But, CustomerAddress also depends on CustomerName (which is not the key). Therefore, this table is not in 3NF.

To convert TokenAllocation table to 3NF, it needs to be broken into two tables:
TokenAllocation (Token, Date, CustomerID)
{Token, Date} is the only candidate key of this table. 
Customer (CustomerID, CustomerName, CustomerAddress)
CustomerID is the candidate key of this table. CustomerName and CustomerAddress depend on the CustomerID. Also, addition of CustomerID column ensures that customers with duplicate names can live in the same table.
Once TokenAllocation is taken to 3NF, the CustomerAddress values need not be updated in multiple rows in the TokenAllocation table. This is because CustomerAddress now lives only in the Customer table.

b. OrderDetails (OrderNumber, ProductNumber, Quantity, Total)
This table stores data of each line item of each order. The order information is stored in the Orders table. The product information including unit price is stored in the Products table. The candidate key in the OrderDetails table is {OrderNumber, ProductNumber}. Quantity depends on the specific order and the specific product. But the Total column contains a calculated value obtained by multiplying unit price for the specific product by the quantity. So, Total does not depend on the key but something else. Therefore, this table is not in 3NF. To convert it to 3NF, the Total column needs to be dropped: 
OrderDetails (OrderNumber, ProductNumber, Quantity)

Based on the understanding above, here the tests that should be applied to check 3NF on every table in the database:
1. Is each criterion of 1NF and 2NF satisfied?
2. What are the candidate keys in each table? For each candidate key, which columns are not a part of it? Does each such column depend on the candidate key and nothing else?

Want to learn more? See more explanation with example data in my video on Third Normal Form.