Disclosure: Hackr.io is supported by its audience. When you purchase through links on our site, we may earn an affiliate commission.
What is Data Modeling?
Table of Contents
What is Data Modeling?
A data model represents the framework of the relationships within the database. Data warehousing is a critical component as it provides the structure that supports the decision-makers analytical needs.
The building of a data model is a critical step in the design of the data warehouse; this step requires business stakeholders to engage in the process by participating in facilitated sessions with the data architects.
The data model is just a shell without populated data. It is nothing more than a drawing. The real expense and heavy lifting in the data warehouse projects are not in the model's design but rather in populating the model from different sources. A data model can be thought of as a guide while the user designs their data warehouse as they will have many deviations from what the vendor might consider as "standard."
Key Steps Involved in the Data Modeling Process
- Identify the entity or business objects that are represented in the data set being modeled.
- Identify the key property for each entity so you can differentiate between them in the data model.
- Draw a rough draft of an entity-relationship model to show the connections between entities.
- Identify the various data attributes that need to be incorporated into the data model.
- Map the attributes to entities, so the model property reflects the business meaning of the data.
- Finalize and validate the data model and refine it to keep it up to date as the data changes.
Uses of a Data Model
- Helps Improve data analysis by the visual representation of data by providing a holistic picture of data that developers use for creating a physical database.
- A data model contains all critical data of an enterprise. Thus, it reduces the chances of data omission, thereby avoiding faulty reports and incorrect results.
- Business requirements are portrayed better by a data model.
- It assists in creating a robust design that brings the entire data of an organization on the same platform. It helps in identifying the redundant, duplicate, and missing data as well.
- A qualified data model helps in providing better consistency across all projects of an enterprise.
- The data model improves data quality.
- It helps Project Managers with better scope and quality management and improves performance to the core.
- It defines relational tables, stored procedures, and primary and foreign keys.
The Perspective of the Data Model
1. Conceptual Data Model
The model defines WHAT does the system contains. The purpose of the model is to establish the relationship between entities and their attributes; there is not much detail available of the actual database structure in this data modeling structure. The aim is to organize, scope, and define business concepts and rules. Data Architects and stakeholders usually create it.
Data Model has three basic tenants:
- Entity: A real-world thing
- Attribute: Characteristics or properties of an entity.
- Relationship: Dependency or association between two entities.
Consider two products, Customer and Product.
- Customer name and number are the attributes of the Customer entity
- Product name and price are the attributes of the product entity
- Sale is a relationship between customer and product
Characteristics of a Conceptual Model
- Offers full coverage of the business concepts to any organization
- The model is mainly developed and designed for the business audience
- The model is independent of hardware specifications like storage capacity, location, or software specifications like DBMS vendor and technology. It focuses on representing data as a user perceives it in the real world.
2. Logical Data Model
The logical data models add further information to the conceptual model elements by defining the structure of the data elements and setting the relationships between them.
The model broadly includes all kinds of data that need to be captured, such as tables, columns, etc. Business Analysts and Data Architects generally design this model.
The logical data model offers an advantage to provide a foundation to form the base for the Physical model. However, the modeling structure remains generic.
No primary or secondary key is defined at this data modeling level; the user only needs to verify and adjust the connector details set earlier for relationships.
Characteristics of the Logical Model
- Data need for a single project is described but can integrate with other logical models based on the project's scope.
- It is developed and designed independently from the DBMS.
- Data attributes have data types with exact length and precision.
- Normalization processes to the model are applied mostly till 3NF.
3. Physical Data Model
The Physical Data Model describes the database-specific implementation of the data model. The model defines the implementation of a data model with the help of the database management system. It outlines the implementation methodology in terms of tables, CRUD operations, indexes, partitioning, etc. Database Administrators and Developers create it.
Characteristics of Physical Model
- Data need for a single project is described but can integrate with other physical models based on the project's scope.
- It contains relationships between tables that address the nullability and cardinality of the relationships.
- They are developed specifically for a version of a DBMS, location, data storage, or technology.
- Columns have exact data types, lengths assigned, and default values.
- Defines Primary and Foreign keys, views, indexes, access profiles, and authorizations, and more
Mastering Data Modeling Fundamentals
Types of Data Models
1. Hierarchical Model
This database modeling is based on a tree-like structuring, each record has a single root or parent, and sibling records are sorted in a particular order. This order is used as the physical order for storing the database. This type of modeling is used for many real-world model relationships. The nodes in this model can only have a one-to-many relationship between them. IBM's Information Management System (IMS) is a primary example of the hierarchical approach, which found wide use in businesses, especially in banking.
Let us say we have a few courses and several students. Each course can be assigned to a single student only. However, a student can take as many courses as he wants.
2. Relational Data Modeling
The relationships and data are represented by a collection of inter-related tables in this model. Each table is a group of columns and rows, where the column represents an attribute of an entity, and rows represent records.
ID, NAME, and AGE are the attributes of the student table, and ID, COURSE_ID, and COURSE_NAME are the table course attributes. The rows with values are records, also called tuples.
3. The Entity-Relationship Model
An Entity-relationship model (ER model) uses diagrams to depict the elements in a database graphically and ease the understanding of underlying models. An ER model is a blueprint of a database that can be implemented as a database table/schema. Entity sets and relationship sets are the main components of the ER model.
Entity Relationship Diagram (ER Diagram)
An ER diagram depicts the relationship among entity sets. The collection of similar entities is an entity set, and these entities can have attributes.
Components Of ER Diagram
ER diagram consists of three main components:
An entity is a component of data, and a rectangle is represented in the ER model — the following ER diagram has two entities: Student and College. The two entities have many to one relationship because students study in a single college.
2. Weak Entity
An entity not having the ability to be uniquely identified by its attributes is a weak entity. Weak entities rely on the relationship with other entities. A double rectangle represents a weak entity in the ER diagram. For example – a bank account cannot be uniquely identified without knowing the bank to which the account belongs, so a bank account is a weak entity.
An attribute describes the property of an entity. An attribute is represented as Oval in an ER diagram.
A diamond shape depicts the relationship among entities.
4. Graph Model
Graph model, together with graph databases, has increased use for describing complex relationships within data sets, particularly in social media, recommender, and fraud detection applications. Scientists describe their system as a connected graph of relationships and nodes using the graph model. Graph data models are used for text analysis, creating models that uncover relationships among data points within documents.
Advantages of Data Modeling
- To represent data objects offered by the functional team accurately is the primary purpose behind designing a data model.
- A detailed data model is required for building the database.
- Relationships between tables, primary and foreign keys, and stored procedures are defined with the help of information defined by data models.
- Data Model helps the business to communicate across organizations.
- The data model documents data mappings in the ETL process.
- Correct sources of data to populate the model are recognized with the help of the Data model.
Disadvantages of Data Modeling
- To develop a Data model, one should know physical data stored characteristics.
- It is a navigational system that produces complex application development, management.
- Modification in the entire application is required even if a smaller change is made in the structure.
- There is no set of data manipulation languages in DBMS.
The process of developing a data model for the storage of data in the Database is data modeling. They ensure consistency in default values, semantics, naming conventions, and security by providing data quality. The model structure helps to define the relational tables, foreign and primary keys, and stored procedures. Conceptual, logical, and physical types of data models exist. The primary aim of designing a data model is to ensure that the functional team's data objects are represented accurately. The biggest drawback is that changing the structure minutely also requires modification in the entire application.
People are also reading:
Leave a comment