Normalization in DBMS: 1NF, 2NF, 3NF, and BCNF [Examples]

When developing the schema of a relational database, one of the most important aspects to be taken into account is to ensure that the duplication of data is minimized. We do this by carrying out database normalization, an important part of the database schema design process.

Here, we explain normalization in DBMS, explaining 1NF, 2NF, 3NF, and BCNF with explanations. First, let’s take a look at what normalization is and why it is important.

What is Normalization in DBMS?

Database normalization is a technique that helps design the schema of the database in an optimal way. The core idea of database normalization is to divide the tables into smaller subtables and store pointers to data rather than replicating it.

Why Do We Carry out Database Normalization?

There are two primary reasons why database normalization is used. First, it helps reduce the amount of storage needed to store the data. Second, it prevents data conflicts that may creep in because of the existence of multiple copies of the same data.

If a database isn’t normalized, then it can result in less efficient and generally slower systems, and potentially even inaccurate data. It may also lead to excessive disk I/O usage and bad performance.

What is a Key?

You should also be aware of what a key is. A key is an attribute that helps identify a row in a table. There are seven different types, which you’ll see used in the explanation of the various normalizations:

Candidate Key
Primary Key
Foreign Key
Super Key
Alternate Key
Composite Key
Unique Key

Database Normalization Example

To understand (DBMS)normalization with example tables, let's assume that we are storing the details of courses and instructors in a university. Here is what a sample database could look like:

Course code	Course venue	Instructor Name	Instructor’s phone number
CS101	Lecture Hall 20	Prof. George	+1 6514821924
CS152	Lecture Hall 21	Prof. Atkins	+1 6519272918
CS154	CS Auditorium	Prof. George	+1 6514821924

Here, the data basically stores the course code, course venue, instructor name, and instructor’s phone number. At first, this design seems to be good. However, issues start to develop once we need to modify information. For instance, suppose, if Prof. George changed his mobile number. In such a situation, we will have to make edits in 2 places.

What if someone just edited the mobile number against CS101, but forgot to edit it for CS154? This will lead to stale/wrong information in the database. This problem can be easily tackled by dividing our table into 2 simpler tables:

Table 1 (Instructor):

Instructor ID
Instructor Name
Instructor mobile number

Table 2 (Course):

Course code
Course venue
Instructor ID

Now, our data will look like the following:

Table 1 (Instructor):

Instructor's ID	Instructor's name	Instructor's number
1	Prof. George	+1 6514821924
2	Prof. Atkins	+1 6519272918

Table 2 (Course):

Course code	Course venue	Instructor ID
CS101	Lecture Hall 20	1
CS152	Lecture Hall 21	2
CS154	CS Auditorium	1

Basically, we store the instructors separately and in the course table, we do not store the entire data of the instructor. Rather, we store the ID of the instructor. Now, if someone wants to know the mobile number of the instructor, they can simply look up the instructor table. Also, if we were to change the mobile number of Prof. George, it can be done in exactly one place. This avoids the stale/wrong data problem.

Further, if you observe, the mobile number now need not be stored 2 times. We have stored it in just 1 place. This also saves storage. This may not be obvious in the above simple example. However, think about the case when there are hundreds of courses and instructors and for each instructor, we have to store not just the mobile number, but also other details like office address, email address, specialization, availability, etc. In such a situation, replicating so much data will increase the storage requirement unnecessarily.

Suggested Course

Database Management System (DBMS) & SQL : Complete Pack 2024

Types of DBMS Normalization

There are various normal forms in DBMS. Each normal form has an importance that helps optimize the database to save storage and reduce redundancies. We explain normalization in DBMS with examples below.

First Normal Form (1NF)

The first normal form simply says that each cell of a table should contain exactly one value. Assume we are storing the courses that a particular instructor takes, we can store it like this:

Instructor's name	Course code
Prof. George	(CS101, CS154)
Prof. Atkins	(CS152)

Here, the issue is that in the first row, we are storing 2 courses against Prof. George. This isn’t the optimal way since that’s now how SQL databases are designed to be used. A better method would be to store the courses separately. For instance:

Instructor's name	Course code
Prof. George	CS101
Prof. George	CS154
Prof. Atkins	CS152

This way, if we want to edit some information related to CS101, we do not have to touch the data corresponding to CS154. Also, observe that each row stores unique information. There is no repetition. This is the First Normal Form.

Data redundancy is higher in 1NF because there are multiple columns with the same in multiple rows. 1NF is not so focused on eliminating redundancy as much as it is focused on eliminating repeating groups.

Second Normal Form (2NF)

For a table to be in second normal form, the following 2 conditions must be met:

The table should be in the first normal form.
The primary key of the table should have exactly 1 column.

The first point is obviously straightforward since we just studied 1NF. Let us understand the second point: a 1-column primary key. A primary key is a set of columns that uniquely identifies a row. Here, no 2 rows have the same primary keys.

Course code	Course venue	Instructor Name	Instructor’s phone number
CS101	Lecture Hall 20	Prof. George	+1 6514821924
CS152	Lecture Hall 21	Prof. Atkins	+1 6519272918
CS154	CS Auditorium	Prof. George	+1 6514821924

In this table, the course code is unique so that becomes our primary key. Let us take another example of storing student enrollment in various courses. Each student may enroll in multiple courses. Similarly, each course may have multiple enrollments. A sample table may look like this (student name and course code):

Student name	Course code
Rahul	CS152
Rajat	CS101
Rahul	CS154
Raman	CS101

Here, the first column is the student name and the second column is the course taken by the student.

Clearly, the student name column isn’t unique as we can see that there are 2 entries corresponding to the name ‘Rahul’ in row 1 and row 3. Similarly, the course code column is not unique as we can see that there are 2 entries corresponding to course code CS101 in row 2 and row 4.

However, the tuple (student name, course code) is unique since a student cannot enroll in the same course more than once. So, these 2 columns when combined form the primary key for the database.

As per the second normal form definition, our enrollment table above isn’t in the second normal form. To achieve the same (1NF to 2NF), we can rather break it into 2 tables:

Students:

Student name	Enrolment number
Rahul	1
Rajat	2
Raman	3

Here the second column is unique and it indicates the enrollment number for the student. Clearly, the enrollment number is unique. Now, we can attach each of these enrollment numbers with course codes.

Courses:

Course code	Enrolment number
CS101	2
CS101	3
CS152	1
CS154	1

These 2 tables together provide us with the exact same information as our original table.

Third Normal Form (3NF)

Before we delve into the details of third normal form, let us understand the concept of a functional dependency on a table.

Column A is said to be functionally dependent on column B if changing the value of A may require a change in the value of B. As an example, consider the following table:

Course code	Course venue	Instructor's name	Department
MA214	Lecture Hall 18	Prof. George	CS Department
ME112	Auditorium building	Prof. John	Electronics Department

Here, the department column is dependent on the professor name column. This is because if in a particular row, we change the name of the professor, we will also have to change the department value. As an example, suppose MA214 is now taken by Prof. Ronald who happens to be from the mathematics department, the table will look like this:

Course code	Course venue	Instructor's name	Department
MA214	Lecture Hall 18	Prof. Ronald	Mathematics Department
ME112	Auditorium building	Prof. John	Electronics Department

Here, when we changed the name of the professor, we also had to change the department column. This is not desirable since someone who is updating the database may remember to change the name of the professor, but may forget updating the department value. This can cause inconsistency in the database.

Third normal form avoids this by breaking this into separate tables:

Course code	Course venue	Instructor's ID
MA214	Lecture Hall 18	1
ME112	Auditorium building,	2

Here, the third column is the ID of the professor who’s taking the course.

Instructor's ID	Instructor's Name	Department
1	Prof. Ronald	Mathematics Department
2	Prof. John	Electronics Department

Here, in the above table, we store the details of the professor against his/her ID. This way, whenever we want to reference the professor somewhere, we don’t have to put the other details of the professor in that table again. We can simply use the ID.

Therefore, in the third normal form, the following conditions are required:

The table should be in the second normal form.
There should not be any functional dependency.

Boyce-Codd Normal Form (BCNF)

The Boyce-Codd Normal form is a stronger generalization of the third normal form. A table is in Boyce-Codd Normal form if and only if at least one of the following conditions are met for each functional dependency A → B:

A is a superkey
It is a trivial functional dependency.

Let us first understand what a superkey means. To understand BCNF in DBMS, consider the following BCNF example table:

Course code	Course venue	Instructor Name	Instructor’s phone number
CS101	Lecture Hall 20	Prof. George	+1 6514821924
CS152	Lecture Hall 21	Prof. Atkins	+1 6519272918
CS154	CS Auditorium	Prof. George	+1 6514821924

Here, the first column (course code) is unique across various rows. So, it is a superkey. Consider the combination of columns (course code, professor name). It is also unique across various rows. So, it is also a superkey. A superkey is basically a set of columns such that the value of that set of columns is unique across various rows. That is, no 2 rows have the same set of values for those columns. Some of the superkeys for the table above are:

Course code
Course code, professor name
Course code, professor mobile number

A superkey whose size (number of columns) is the smallest is called a candidate key. For instance, the first superkey above has just 1 column. The second one and the last one have 2 columns. So, the first superkey (Course code) is a candidate key.

Boyce-Codd Normal Form says that if there is a functional dependency A → B, then either A is a superkey or it is a trivial functional dependency. A trivial functional dependency means that all columns of B are contained in the columns of A. For instance, (course code, professor name) → (course code) is a trivial functional dependency because when we know the value of course code and professor name, we do know the value of course code and so, the dependency becomes trivial.

Let us understand what’s going on:

A is a superkey: this means that only and only on a superkey column should it be the case that there is a dependency of other columns. Basically, if a set of columns (B) can be determined knowing some other set of columns (A), then A should be a superkey. Superkey basically determines each row uniquely.

It is a trivial functional dependency: this means that there should be no non-trivial dependency. For instance, we saw how the professor’s department was dependent on the professor’s name. This may create integrity issues since someone may edit the professor’s name without changing the department. This may lead to an inconsistent database.

Another example would be if a company had employees who work in more than one department. The corresponding database can be decomposed into where the functional dependencies could be such keys as employee ID and employee department.

Fourth normal form

A table is said to be in fourth normal form if there is no two or more, independent and multivalued data describing the relevant entity.

Fifth normal form

A table is in fifth normal form if:

It is in its fourth normal form.
It cannot be subdivided into any smaller tables without losing some form of information.

Normalization is Important for Database Systems

Normalization in DBMS is useful for designing the schema of a database such that there is no data replication which may possibly lead to inconsistencies. While designing the schema for applications, we should always think about how we can make use of these forms.

If you want to learn more about SQL, check out our post on the best SQL certifications. You can also read about SQL vs MySQL to learn about what the two are. To become a data engineer, you’ll need to learn about normalization and a lot more, so get started today.

Frequently Asked Questions

1. Does database normalization reduce the database size?

Yes, database normalization does reduce database size. Redundant data is removed, so the database disk storage use becomes smaller.

2. Which normal form can remove all the anomalies in DBMS?

5NF will remove all anomalies. However, generally, most 3NF tables will be free from anomalies.

3. Can database normalization reduce the number of tables?

Database normalization increases the number of tables. This is because we split tables into sub-tables in order to eliminate redundant data.

4. What is the Difference between BCNF and 3NF?

BCNF is an extension of 3NF. The primary difference is that it removes the transitive dependency from a relation.

People are also reading:

Beginner Courses

Intermediate Courses

Topics

Project Topics

Popular Technologies

Popular Articles

Topics

Beginner Courses

Intermediate Courses

Python

Web Development

Data Analysis

Python

Popular Projects

HTML

Popular Projects

JavaScript

Popular Projects

Java

Popular Projects

C++

Popular Projects

React

Popular Projects

PHP

Popular Projects

Arduino

Popular Projects

Python

Courses

Articles

HTML

Courses

Articles

JavaScript

Courses

Articles

Linux

Courses

Articles

Docker

Courses

Articles

Crypto

Courses

Articles

Projects

Blog

Code Alongs

Cheat Sheets

User-Submitted Resources

Code Editors

AI Tools

Normalization in DBMS: 1NF, 2NF, 3NF, and BCNF [Examples]

What is Normalization in DBMS?

Why Do We Carry out Database Normalization?

What is a Key?

Database Normalization Example

Types of DBMS Normalization

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form (BCNF)

Fourth normal form

Fifth normal form

Normalization is Important for Database Systems

Frequently Asked Questions

1. Does database normalization reduce the database size?

2. Which normal form can remove all the anomalies in DBMS?

3. Can database normalization reduce the number of tables?

4. What is the Difference between BCNF and 3NF?

Learn More