We are Hiring Tech Content Writers (Freelancer/Full-Time). Are you interested? Apply Now

DBMS Normalization: 1NF, 2NF, 3NF and BCNF with Examples

DBMS Normalization

Hackr.io.

Spread the love

When developing the schema of a relational database, one of the most important aspect to be taken into account is to ensure that the duplication is minimized. This is done for 2 purposes:

  • Reducing the amount of storage needed to store the data.
  • Avoiding unnecessary data conflicts that may creep in because of multiple copies of the same data getting stored.

Normalization in DBMS

Database Normalization is a technique that helps in designing the schema of the database in an optimal manner so as to ensure the above points. The core idea of database normalization is to divide the tables into smaller subtables and store pointers to data rather than replicating it. For a better understanding of what we just said, here is a simple Normalization example:

To understand normalization in database with example tables, let’s assume that we are supposed to store the details of courses and instructors in a university. Here is what a sample database could look like:

Course codeCourse venueInstructor NameInstructor’s phone number
CS101Lecture Hall 20Prof. George+1 6514821924
CS152Lecture Hall 21
Prof. Atkins+1 6519272918
CS154CS AuditoriumProf. George +1 6514821924

Here, the data basically stores the course code, course venue, instructor name, and instructor’s phone number. At first, this design seems to be good. However, issues start to develop once we need to modify information. For instance, suppose, if Prof. George changed his mobile number. In such a situation, we will have to make edits in 2 places. What if someone just edited the mobile number against CS101, but forgot to edit it for CS154? This will lead to stale/wrong information in the database.

This problem, however, can be easily tackled by dividing our table into 2 simpler tables:

Table 1 (Instructor):

  • Instructor ID
  • Instructor Name
  • Instructor mobile number

Table 2 (Course):

  • Course code
  • Course venue
  • Instructor ID

Now, our data will look like the following:

Table 1 (Instructor):

Insturctor's IDInstructor's nameInstructor's number
1
Prof. George+1 6514821924
2
Prof. Atkins+1 6519272918

Table 2 (Course):

Course codeCourse venueInstructor ID
CS101Lecture Hall 20
1
CS152Lecture Hall 21
2
CS154CS Auditorium
1

Basically, we store the instructors separately and in the course table, we do not store the entire data of the instructor. We rather store the ID of the instructor. Now, if someone wants to know the mobile number of the instructor, he/she can simply look up the instructor table. Also, if we were to change the mobile number of Prof. George, it can be done in exactly one place. This avoids the stale/wrong data problem.

Further, if you observe, the mobile number now need not be stored 2 times. We have stored it at just 1 place. This also saves storage. This may not be obvious in the above simple example. However, think about the case when there are hundreds of courses and instructors and for each instructor, we have to store not just the mobile number, but also other details like office address, email address, specialization, availability, etc. In such a situation, replicating so much data will increase the storage requirement unnecessarily.

The above is a simplified example of how database normalization works. We will now more formally study it.

Types of Normalization

There are various database “Normal” forms. Each normal form has an importance which helps in optimizing the database to save storage and to reduce redundancies.

First Normal Form (1NF)

The First normal form simply says that each cell of a table should contain exactly one value. Let us take an example. Suppose we are storing the courses that a particular instructor takes, we can store it like this:

Instructor's nameCourse code
Prof. George
(CS101, CS154)
Prof. Atkins (CS152)

Here, the issue is that in the first row, we are storing 2 courses against Prof. George. This isn’t the optimal way since that’s now how SQL databases are designed to be used. A better method would be to store the courses separately. For instance:

Instructor's nameCourse code
Prof. George
CS101
Prof. George
CS154
Prof. Atkins
CS152

This way, if we want to edit some information related to CS101, we do not have to touch the data corresponding to CS154. Also, observe that each row stores unique information. There is no repetition. This is the First Normal Form.

Second Normal Form (2NF)

For a table to be in second normal form, the following 2 conditions are to be met:

  1. The table should be in the first normal form.
  2. The primary key of the table should compose of exactly 1 column.

The first point is obviously straightforward since we just studied 1NF. Let us understand the first point – 1 column primary key. Well, a primary key is a set of columns that uniquely identifies a row. Basically, no 2 rows have the same primary keys. Let us take an example.

Course codeCourse venueInstructor NameInstructor’s phone number
CS101Lecture Hall 20Prof. George+1 6514821924
CS152Lecture Hall 21
Prof. Atkins+1 6519272918
CS154CS AuditoriumProf. George +1 6514821924

Here, in this table, the course code is unique. So, that becomes our primary key. Let us take another example of storing student enrollment in various courses. Each student may enrol in multiple courses. Similarly, each course may have multiple enrollments. A sample table may look like this (student name and course code):

Student nameCourse code
Rahul

CS152
Rajat
CS101
Rahul
CS154
RamanCS101

Here, the first column is the student name and the second column is the course taken by the student. Clearly, the student name column isn’t unique as we can see that there are 2 entries corresponding to the name ‘Rahul’ in row 1 and row 3. Similarly, the course code column is not unique as we can see that there are 2 entries corresponding to course code CS101 in row 2 and row 4. However, the tuple (student name, course code) is unique since a student cannot enroll in the same course more than once. So, these 2 columns when combined form the primary key for the database.

As per the second normal form definition, our enrollments table above isn’t in second normal form. To achieve the same (1NF to 2NF), we can rather break it into 2 tables:

Students:

Student nameEnrolment number
Rahul1
Rajat2
Raman3

Here the second column is unique and it indicates the enrollment number for the student. Clearly, the enrollment number is unique. Now, we can attach each of these enrollment numbers with course codes

Courses:

Course codeEnrolment number
CS1012
CS1013
CS1521
CS1541

These 2 tables together provide us with the exact same information as our original table.

Third Normal Form (3NF)

Before we delve into details of third normal form, let us understand the concept of a functional dependency on a table.

Column A is said to be functionally dependent on column B if changing the value of A may require a change in the value of B. As an example, consider the following table:

Course codeCourse venueInstructor's nameDepartment
MA214Lecture Hall 18
Prof. GeorgeCS Department
ME112Auditorium building
Prof. John Electronics Department

Here, the department column is dependent on the professor name column. This is because if in a particular row, we change the name of the professor, we will also have to change the department value. As an example, suppose MA214 is now taken by Prof. Ronald who happens to be from the Mathematics department, the table will look like this:

Course codeCourse venueInstructor's nameDepartment
MA214Lecture Hall 18Prof. Ronald
Mathematics Department
ME112Auditorium buildingProf. John Electronics Department

Here, when we changed the name of the professor, we also had to change the department column. This is not desirable since someone who is updating the database may remember to change the name of the professor, but may forget updating the department value. This can cause inconsistency in the database.

Third normal form avoids this by breaking this into separate tables:

Course codeCourse venueInstructor's ID
MA214
Lecture Hall 181
ME112 Auditorium building,
2

Here, the third column is the ID of the professor who’s taking the course.

Instructor's IDInstructor's NameDepartment
1
Prof. RonaldMathematics Department
2Prof. John
Electronics Department

Here, in the above table, we store the details of the professor against his/her ID. This way, whenever we want to reference the professor somewhere, we don’t have to put the other details of the professor in that table again. We can simply use the ID.

Therefore, in the third normal form, the following conditions are required:

  • The table should be in the second normal form.
  • There should not be any functional dependency.

Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal form is a stronger generalization of third normal form. A table is in Boyce-Codd Normal form if and only if at least one of the following conditions are met for each functional dependency A → B:

  • A is a superkey
  • It is a trivial functional dependency.

Let us first understand what a superkey means. To understand BCNF in DBMS, consider the following BCNF example table:

Course codeCourse venueInstructor NameInstructor’s phone number
CS101Lecture Hall 20Prof. George+1 6514821924
CS152Lecture Hall 21
Prof. Atkins+1 6519272918
CS154CS AuditoriumProf. George +1 6514821924

Here, the first column (course code) is unique across various rows. So, it is a superkey. Consider the combination of columns (course code, professor name). It is also unique across various rows. So, it is also a superkey. A superkey is basically a set of columns such that the value of that set of columns is unique across various rows. That is, no 2 rows have the same set of values for those columns.

Some of the superkeys for the table above are:

  • Course code
  • Course code, professor name
  • Course code, professor mobile number

A superkey whose size (number of columns) is the smallest is called as a candidate key. For instance, the first superkey above has just 1 column. The second one and the last one have 2 columns. So, the first superkey (Course code) is a candidate key.

Boyce-Codd Normal Form says that if there is a functional dependency A → B, then either A is a superkey or it is a trivial functional dependency. A trivial functional dependency means that all columns of B are contained in the columns of A. For instance, (course code, professor name) → (course code) is a trivial functional dependency because when we know the value of course code and professor name, we do know the value of course code and so, the dependency becomes trivial.

Let us understand what’s going on:

A is a superkey: this means that only and only on a superkey column should it be the case that there is a dependency of other columns. Basically, if a set of columns (B) can be determined knowing some other set of columns (A), then A should be a superkey. Superkey basically determines each row uniquely.

It is a trivial functional dependency: this means that there should be no non-trivial dependency. For instance, we saw how the professor’s department was dependent on the professor’s name. This may create integrity issues since someone may edit the professor’s name without changing the department. This may lead to an inconsistent database.

There are also 2 other normal forms:

Fourth normal form

A table is said to be in fourth normal form if there is no two or more, independent and multivalued data describing the relevant entity.

Fifth normal form

A table is in fifth Normal Form if:

  • It is in fourth normal form.
  • It cannot be subdivided into any smaller tables without losing some form of information.

Summary

The various forms of database normalization are useful while designing the schema of a database in such a way that there is no data replication which may possibly lead to inconsistencies. While designing schema for applications, we should always think about how can we make use of these forms.

Related Posts

Your email address will not be published. Required fields are marked *

*

38 Comments, RSS

  1. Mary Brown December 4, 2018 @ 11:39 am

    What are different types of normalization?

    • Hackr Team

      Hackr Team December 4, 2018 @ 7:24 pm

      That is what the article is all about:)

  2. Annie Martinez December 4, 2018 @ 11:42 am

    What is lock in DBMS?

    • Rose Potter December 5, 2018 @ 9:44 pm

      Lock is the mechanism to prevent the overwriting of data. Once a process updates the data, it locks so that the other process can’t read that data until the update is finished. Database locks serve to protect shared resources or objects like tables, rows etc.Lock are of two kinds-
      • Binary Locks
      • Shared/Exclusive

  3. Betty Bryant December 4, 2018 @ 11:43 am

    Is star schema normalized or denormalized?

    • Sylvia Boone December 5, 2018 @ 9:45 pm

      In the Star schema, dimensions are denormalized. For example, if you have an employee dimension and the employee belongs to a particular department. Then in star schema, you will only have the employee table and repeat the department data for each employee. This will increase the data retrieval speed and save the storage.

  4. Emily Williams December 4, 2018 @ 11:45 am

    Are fact tables normalized or denormalized?

    • Eduardo Edwards December 5, 2018 @ 9:45 pm

      Fact tables are completely normalized because the redundant information is maintained in the dimensions table. Dimensions table can be normalized or denormalized. If anyone say that fact table is denormalized as it might contain duplicate foreign key then it would be partially correct to say denormalized. There can be some situations where fact table contains lot of columns. In that case, we can say that fact table is denormalized, but it would be much better to say that schema is denormalized.

  5. Mildred Russell December 4, 2018 @ 11:45 am

    What is the difference between normalization and denormalization?

    • Kristopher Howard December 5, 2018 @ 9:46 pm

      Normalization: The data is divided into multiple tables to achieve data integrity and data redundancy. This process is known as normalization.
      De-Normalization: When we apply the opposite process of normalization where the data from multiple tables are combined into one table to save the storage and data retrieval become faster. Data integrity may not retain in the denormalization and redundancy added into this.

  6. Nancy Morris December 4, 2018 @ 11:46 am

    What is the difference between dimension and fact table?

    • Marie Ramsey December 5, 2018 @ 9:48 pm

      Dimension and fact tables are used in data warehousing.
      Dimension Table: Dimension table contains dimensions of a fact. Dimension table is denormalized. These tables mainly consist descriptive attributes. A foreign key is used to join with the fact table. Dimensions have several types- Conformed Dimensions, Outrigger Dimensions, Shrunken Rollup Dimensions, Dimension-to-Dimension Table Joins, Role-Playing Dimensions, Junk Dimensions, Degenerate Dimensions, Swappable Dimensions, and Step Dimensions.
      Fact Table: Fact tables are the primary table in a dimension model which contains- facts, metrics, and measurements about a business process. Fact tables are normalized. There are 3 types of facts- Additive, Semi-additive, and Non-additive.

  7. Maria Clark December 4, 2018 @ 11:46 am

    What is the difference between database and data warehouse?

    • Al Alvarez December 5, 2018 @ 9:49 pm

      Database is the collection of data in the form of rows, columns, and tables that is indexed periodically to make relevant information more accessible. A database uses OLTP (Online Transaction Processing) to store current transactions and enable fast access to specific transactions. Whereas, the Data Warehouse is the system which pulls data together from multiple sources within an organization for analysis and reporting. A Data Warehouse using OLAP (Online Analytical Processing) to store large quantities of data and enable fast on complex queries across all the data.

  8. Rose Cooper December 4, 2018 @ 11:47 am

    What is granularity of data?

    • Virginia Hale December 5, 2018 @ 9:49 pm

      Granularity is the measurement of the level of detail. Granularity can be easily understood by the term of detail in a set of data. The greater the granularity, the deeper level of detail so the granular data means detailed data. Example of data granularity is how a name field is subdivided if it is contained in a single field or subdivided into its constituents such as first name, middle name and last name.

  9. Ruth Hill December 4, 2018 @ 11:48 am

    How long does it take to learn SQL?

    • Cary Patrick December 5, 2018 @ 9:51 pm

      There is no fix duration to learn SQL. It totally depends on your interest in learning it and your computer programming skills that will decide that how much time you will take to learn SQL. SQL is not very hard, so if you will start dedicatedly, you can learn fast. Start with the basics and practice the SQL statements. In-depth learning requires more practice. There are lots of learning materials available on the internet. Follow them and start your journey to the SQL DBA.

  10. Andrea Harris December 4, 2018 @ 11:56 am

    Real Time Application of BCNF?

    • Francis Vega December 5, 2018 @ 10:01 pm

      Example: Suppose there is a company wherein employees work in more than one department. They store the data like this:
      emp_id emp_nationality emp_dept dept_type dept_no_of_emp
      1001 Austrian Production and planning D001 200
      1001 Austrian stores D001 250
      1002 American design and technical support D134 100
      1002 American Purchasing department D134 600

      Functional dependencies in the table above:
      emp_id -> emp_nationality
      emp_dept -> {dept_type, dept_no_of_emp}
      Candidate key: {emp_id, emp_dept}

      To make the table comply with BCNF we can break the table in three tables like this:

      emp_nationality table:

      emp_id emp_nationality
      1001 Austrian
      1002 American

      emp_dept table:

      emp_dept dept_type dept_no_of_emp
      Production and planning D001 200
      stores D001 250
      design and technical support D134 100
      Purchasing department D134 600

      emp_dept_mapping table:

      emp_id emp_dept
      1001 Production and planning
      1001 Stores
      1002 design and technical support
      1002 Purchasing department

      Functional dependencies:
      emp_id -> emp_nationality
      emp_dept -> {dept_type, dept_no_of_emp}
      Candidate keys:
      For first table: emp_id
      For second table: emp_dept
      For third table: {emp_id, emp_dept}
      This is now in BCNF as in both the functional dependencies left side part is a key.

  11. Janice Garcia December 4, 2018 @ 11:57 am

    How to Convert 1 NF to 2 NF Example?

    • Lonnie Bell December 5, 2018 @ 10:06 pm

      Un-normalized
      Module Dept Lecturer Texts
      M1 D1 L1 T1, T2
      M2 D1 L1 T1, T3
      M3 D1 L2 T4
      M4 D2 L3 T1, T5

      1NF
      Module Dept Lecturer Text
      M1 D1 L1 T1
      M1 D1 L1 T2
      M2 D1 L1 T1
      M2 D1 L1 T3
      M3 D1 L2 T4
      M4 D2 L3 T1
      M4 D2 L3 T5

      2NFa
      Module Dept Lecturer
      M1 D1 L1
      M2 D1 L1
      M3 D1 L2
      M4 D2 L3

      2NFb
      Module Text
      M1 T1
      M1 T2
      M2 T1
      M2 T3
      M3 T4
      M4 T1
      M4 T5

  12. Lois Cox December 4, 2018 @ 12:01 pm

    What is MySQL Normalization?

    • Jenny Pierce December 5, 2018 @ 10:07 pm

      Database normalization was introduced as a procedure by Edgar Frank Codd, a computer scientist at IBM. Normalization applies to get rid of the dependencies and having minimal fields in the data table. Normalization is to make sure that all fields in the table only belongs to the one domain and avoid null fields.

  13. Wanda Lee December 4, 2018 @ 12:03 pm

    Difference between BCNF and 3NF?

    • Sandra Bowen December 5, 2018 @ 10:08 pm

      The difference between 3NF and BCNF is subtle.

      3NF
      Definition
      A relation is in 3NF if it is in 2NF and no non-prime attribute transitively depends on the primary key. In other words, a relation R is in 3NF if for each functional dependency X ⟶ A in R at least one of the following conditions are met:

      X is a key or superkey in R
      A is a prime attribute in R
      Example
      Given the following relation:

      EMP_DEPT(firstName, employeeNumber, dateOfBirth, address, departmentNumber, departmentName)

      An employee can only work in one department and each department has many employees.

      The candidate key is employeeNumber.

      Consider the following functional dependencies:

      employeeNumber ⟶ firstName, dateOfBirth, address, departmentNumber
      departmentNumber ⟶ departmentName
      Given the definition above it is possible to conclude that the relation EMP_DEPT is not in 3NF because the second functional dependency does not meet any of the 2 conditions of the 3NF:

      departmentNumber is not a key or superkey in EMP_DEPT
      departmentName is not a prime attribute in EMP_DEPT
      BCNF
      Definition
      A relation R is in BCNF if it is in 3NF and for each functional dependency X ⟶ A in R, X is a key or superkey in R. In other words, the only difference between 3NF and BCNF is that in BCNF it is not present the second condition of the 3NF. This makes BCNF stricter than 3NF as any relation that is in BCNF will be in 3NF but not necessarily every relation that is in 3NF will be in BCNF.

      Example
      Given the following relation:

      STUDENT_COURSE(studentNumber, socialSecurityNumber, courseNumber)

      A student can assist to many courses and in a course there can be many students.

      The candidate keys are:

      socialSecurityNumber, courseNumber
      studentNumber, courseNumber
      Consider the following functional dependencies:

      studentNumber ⟶ socialSecurityNumber
      socialSecurityNumber ⟶ studentNumber
      Given the definitioin above it is possible to conclude that STUDENT_COURSE is not in BCNF as at least studentNumber is not a key or superkey in STUDENT_COURSE.

      Source: https://stackoverflow.com/questions/19749913/what-is-the-difference-between-3nf-and-bcnf

  14. Peg Lee December 4, 2018 @ 7:26 pm

    What is the purpose or need of normalization in database?

    • Dwayne Hicks December 5, 2018 @ 10:09 pm

      Database normalization is the process of organizing data and minimizes the data redundancy. This is the main purpose of normalization. The basic need of normalization is to prevent anomalies from messing up the data. The reasons why we use data normalization are to minimize duplicate data, to minimize or avoid data modification issues, and to simplify queries.
      2 primary advantages of normalization:
      • Easier object to data mapping
      • Increase consistency

  15. Jack Graw December 4, 2018 @ 7:27 pm

    What is the alternative to database normalization?

    • Judy Peterson December 5, 2018 @ 10:11 pm

      There is no alternative to normalization. This depends on your application needs that it requires normalization or not. If you are working with or designing an OLTP application where more independent tables are actually given a benefit of storing data in the more optimal way. There is no requirement of normalization when reading the data from many normalized tables.
      There are other techniques available like star schema, denormalization etc. but it all depends on your need.

  16. Olive Yu December 4, 2018 @ 7:28 pm

    Can database normalization reduce number of tables?

    • Ann Neal December 5, 2018 @ 10:12 pm

      Normalization removes redundant data so sometimes it increases the number of tables.

  17. Gene Jacket December 4, 2018 @ 7:29 pm

    Which normal form can remove all the anomalies in DBMS?

    • Kristi Jackson December 5, 2018 @ 10:14 pm

      Normalization makes a table or relation free from insert/update/delete anomalies and saves the space by releasing the duplicate data. Basically, the 3NF is enough to remove all the anomalies from your database. Higher NFs can reduce the level and will affect maintaining all those tables and reporting with several JOINS.

  18. Doug Mather December 4, 2018 @ 7:29 pm

    Does database normalization reduce the database size?

    • Oliver Watson December 5, 2018 @ 10:15 pm

      Normalization removes the duplicate data and helps to keep the data error free. This helps to ensure that the size of the database doesn’t grow large with duplicate data. At the same time, the speed of some types of operations can be slower in a non-normalized form. Normalization increases the efficiency of the database.

  19. Maryam bibi December 24, 2018 @ 5:33 pm

    hello!
    what is the primary key in the table .
    it so confusing .