Data Science

What is Data Modeling?

Posted in Data Science
What is Data Modeling?

What is Data Modeling?

A data model represents the framework of the relationships within the database. The framework in the world of data warehousing is a critical component as it provides the structure which supports the analytical needs of the decision-makers.

The building of a data model is a critical step in the design of the data warehouse; this step requires business stakeholders to engage in the process by participating in facilitated sessions with the data architects.

The data model is just a shell without populated data. It is nothing more than a drawing. The real expense and heavy lifting in the data warehouse projects are not in the design of the model, but rather populating the model from different sources. A data model can be thought of as a guide while the user designs their data warehouse as they are going to have many deviations from what the vendor might consider as "standard."

Key Steps Involved in the Data Modeling Process

  • Identify the entity or business objects that are represented in the data set being modeled.
  • Identify the key property for each entity so you can differentiate between them in the data model.
  • Draw a rough draft of an entity-relationship model to show the connections between entities.
  • Identify the various data attributes that need to be incorporated into the data model.
  • Map the attributes to entities, so the model property reflects the business meaning of the data.
  • Finalize and validate the data model and refine it to keep it up to date as the data changes.

Uses of a Data Model

  • Helps Improve data analysis by the visual representation of data by providing a holistic picture of data that developers use for creating a physical database.
  • A data model contains all critical data of an enterprise. Thus, it reduces the chances of data omission, thereby avoiding faulty reports and incorrect results.
  • Business requirements are portrayed better by a data model.
  • It assists in the creation of a robust design that brings the entire data of an organization on the same platform. It helps in identifying the redundant, duplicate, and missing data as well.
  • A qualified data model helps in providing better consistency across all projects of an enterprise.
  • The data model improves data quality.
  • It helps Project Managers with a better scope and quality management and improves performance to the core.
  • It defines relational tables, stored procedures, and primary and foreign keys.

The Perspective of the Data Model

1. Conceptual Data Model

The model defines WHAT does the system contains. The purpose of the model is to establish the relationship between entities and their attributes; there is not much detail available of the actual database structure in this data modeling structure. The aim is to organize, scope and define business concepts and rules. It is usually created by Data Architects and stakeholders.

Data Model has three basic tenants:

  • Entity: A real-world thing
  • Attribute: Characteristics or properties of an entity.
  • Relationship: Dependency or association between two entities.

Example

Consider two products Customer and Product.

  • Customer name and number are the attributes of the Customer entity
  • Product name and price are the attributes of the product entity
  • Sale is a relationship between customer and product

Conceptual Data Model

Characteristics of a Conceptual Model

  • Offers full coverage of the business concepts to any organization
  • The model is mainly developed and designed for the business audience
  • The model is independent of hardware specifications like storage capacity, location, or software specifications like DBMS vendor and technology. It focuses on representing data as a user perceives it in the real world

2. Logical Data Model

The logical data models add further information to the conceptual model elements by defining the structure of the data elements and setting the relationships between them.

The model broadly includes all kinds of data that need to be captured such as tables, columns, etc. This model is generally designed by Business Analysts and Data Architects.

The logical data model offers an advantage that is to provide a foundation to form the base for the Physical model. However, the modeling structure remains generic.

Logical Data Model

No primary or secondary key is defined at this data modeling level, the user only needs to verify and adjust the connector details that were set earlier for relationships.

Characteristics of the Logical Model

  • Data need for a single project is described but can integrate with other logical models based on the scope of the project.
  • It is developed and designed independently from the DBMS.
  • Data attributes have datatypes with exact length and precision.
  • Normalization processes to the model are applied mostly till 3NF.

3. Physical Data Model

The Physical Data Model describes the database-specific implementation of the data model. The model defines the implementation of a data model with the help of the database management system. It outlines the implementation methodology in terms of tables, CRUD operations, indexes, partitioning, etc. Database Administrators and Developers create it.

Physical Data Model

Characteristics of Physical Model

  • Data need for a single project is described but can integrate with other physical models based on the scope of the project.
  • It contains relationships between tables that address nullability and cardinality of the relationships.
  • They are developed specifically for a version of a DBMS, location, data storage, or technology.
  • Columns have exact datatypes, lengths assigned, and default values.
  • Defines Primary and Foreign keys, views, indexes, access profiles, and authorizations, and more

Types of Data Models

1. Hierarchical Model

This database modeling is based on a tree-like structuring, each record has a single root or parent and sibling records are sorted in a particular order. This order is used as the physical order for storing the database. This type of modeling is used for many real-world model relationships. The nodes in this model can only have a one-to-many relationship between them. IBM's Information Management System (IMS) is a primary example of the hierarchical approach, which found wide use in businesses, especially in banking.

Example:

Let us say we have a few courses and a number of students. Each course can be assigned to a single student only. However, a student can take as many courses as he wants.

Hierarchical Model

2. Relational Data Modeling

The relationships and data are represented by a collection of inter-related tables in this model. Each table is a group of columns and rows, where the column represents an attribute of an entity, and rows represent records.

Example :

ID NAME AGE
101 May 22
105 John 25
121 Park 28
125 James 32
ID COURSE_ID COURSE_NAME
105 C45 Computer Architecture
105 C80 Algorithms
121 C76 Javascript
121 C67 Java

ID, NAME, and AGE are the attributes of the student table and ID, COURSE_ID and COURSE_NAME are the attributes of the table Course. The rows with values are records also called tuples.

3. The Entity-Relationship Model

An Entity-relationship model (ER model) uses diagrams to graphically depict the elements in a database and to ease the understanding of underlying models. An ER model is a blueprint of a database that can be implemented as a database table/schema. Entity sets and relationship sets are the main components of the ER model.

Entity Relationship Diagram (ER Diagram)

An ER diagram depicts the relationship among entity sets. The collection of similar entities is an entity set and these entities can have attributes.

Components Of ER Diagram

Components Of ER Diagram

ER diagram consists of three main components:

  1. Entity
  2. Attribute
  3. Relationship
1. Entity

Entity

An entity is a component of data, and a rectangle represents in the ER model — the following ER diagram, we have two entities Student and College. The two entities have many to one relationship because students study in a single college.

2. Weak Entity

An entity not having the ability to be uniquely identified by its attributes is a weak entity. Weak entities rely on the relationship with other entity. A double rectangle represents a weak entity in the ER diagram. For example – a bank account cannot be uniquely identified without knowing the bank to which the account belongs, so a bank account is a weak entity.

Weak Entity

3. Attribute

The property of an entity is described by an attribute. An attribute is represented as Oval in an ER diagram.

4. Relationship

The relationship among entities is depicted by a diamond shape.

4. Graph Model

Graph model, together with graph databases, has found an increased use for describing complex relationships within data sets, particularly in social media, recommender, and fraud detection applications. Scientists describe their system as a connected graph of relationships and nodes using the graph model. Graph data models are used for text analysis, creating models that uncover relationships among data points within documents.

Advantages of Data Modeling

  • To represent data objects offered by the functional team accurately is the primary purpose behind designing a data model.
  • The detailed data model is required for building the database.
  • Relationships between tables, primary and foreign keys, and stored procedures are defined with the help information defined by data models.
  • Data Model helps the business to communicate across organizations.
  • Data mappings in the ETL process are documented by the data model.
  • Correct sources of data to populate the model is recognized with the help of Data model

Disadvantages of Data Modeling

  • To develop a Data model, one should know physical data stored characteristics.
  • It is a navigational system that produces complex application development, management.
  • Modification in the entire application is required even if a smaller change is made in the structure.
  • There is no set of data manipulation language in DBMS.

Conclusion

The process of developing a data model for the storage of data in the Database is data modeling. They ensure consistency in default values, semantics, naming conventions, and security by providing the quality of data. The model structure helps to define the relational tables, foreign and primary keys, and stored procedures. Conceptual, logical, and physical types of data models exist. The primary aim of a designing data model is to make sure that data objects offered by the functional team are represented accurately. The biggest drawback is that changing the structure minutely also requires modification in the entire application.

People are also reading:

Simran Kaur Arora

Simran Kaur Arora

Simran, born in Delhi, did her schooling and graduation from India in Computer Science. Curious and passionate about technology urged her to study for an MS in the same from the renowned Silicon Valley, California, USA. Graduated in 2017, she flew back to India and now works for hackr.io as a freelance technical writer. View all posts by the Author

Leave a comment

Your email will not be published
Cancel