Database Design and Normalization Misconceptions

“Which is your favorite subject in computer science curriculum?”

Well, mine is “Database Design” and I believe it is one of the most important subjects of computer science.

In my 10-year long career, from a developer to an engineering manager, I have been part of many product teams. They were from different domains and with different scalability needs. During this journey, I have seen Database Design and Normalization from various perspectives.

I want to share my thoughts around these two fundamental database concepts.

First let’s hear what Wikipedia says about Database Design.

“Database design is the process of producing a detailed data model of database. This data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a data definition language, which can then be used to create a database. A fully attributed data model contains detailed attributes for each entity.”

Phew! I am glad we don’t have to remember this anymore, thanks to the Internet.

Let me elaborate the two words logical and physical design choices from above with an example.

My 3-year old son packs his bag everyday on his own. He puts books in a big pocket, water bottle in a side pocket and lunch box in the top pocket of his bag. Almost everyone will do the same thing. Why? Because they are logical and physical design choices.

Another example can be people shopping at supermarkets. It is almost impossible for someone to go to a grocery section to buy furniture unless the store is designed poorly. Everyday we encounter many such situations where we traverse through the organized world. i.e. choices based on logical and physical design.

But, when we take on Database Design as a task in our professional life, many of us do the mistake of selecting “databases” first, such as MySQL, MSSQL, MongoDB rather than finding more about data and relationships between them.

It is like while designing super-market, we think about Wooden or Steel shelves for organizing things first rather than identifying what different types of items super-market is going to sell.

The other such topic is Normalization also my favorite interview topic. Many of the programmers have a misconception that normalization creates performance issues with unnecessary joins.

Let’s check the definition:

“Database Normalization, or simply normalization, is the process of organizing the attributes and relations of a relational database to reduce data redundancy and improve data integrity.”

One misconception around normalization is that it is applied while creating tables and not while designing databases. Though its practical implementation is at table-level only, but that depends.

Depends on what?

3 years old carries books and water in multi-pocket bag. A young software engineer carries laptop in a laptop bag. His loving wife packs delicious lunch for him in a tiffin box.

To carry something, we need a container but it depends on what is being carried.

Similarly, storing data depends on what needs to be stored and how frequently it is accessed. But that doesn’t mean we don’t identify attributes of data and relations between data, which is a essentially database design.

Like many other concepts in computer science, we deal with database design and normalization everyday, but sometimes we simply fail to notice the correlation. Hope this helps and improves perception about database design and normalization especially for the entry-level programmers.