Database Design Concepts

People who work in database systems should have some knowledge in database design concepts. This article let you understand the high level process in designing a new relational database system. These concepts will help you in designing, developing, administrating and maintaining database systems. A SQL Developer / DBA with 5+ years of experience will be asked a common question in any interview is “What are the phases in database Design / Any Idea what are the steps needs to be taken to design a secure , scalable and optimized database system? / How to design a database?”. Here is the answer.

Database Design Concepts

Here we are going to learn below concepts that helps you in designing a new database.

Database Design Phases
Normalization
Types of Relationships
Database Models
Database Integrity

Database Design Phases

Requirement Specification and Analysis
Conceptual\Semantic Database Design
Implementation\Logical Schema Design
Physical Schema Design
Optimization \ Administration

Requirement Specification and Analysis:

In this phase a detailed analysis of the requirement is done. The objective of this phase is to get a clear understanding of the requirements.
The database designer’s first step is to draw up a data requirements document. The requirements document contains a concise and non-technical summary of what data items will be stored in the database, and how the various data items relate to one another.
Taking the ‘data requirements document’, further analysis is done to give meaning to the data items, e.g. define the more detailed attributes of the data and define constraints if needed. The result of this analysis is a ‘preliminary specifications’ document.
Some of the information gathering methods are:
- Interview
- Analyzing documents and existing system or data
- Survey
- Site visit
- Joint Applications Design (JAD) and Joint Requirements Analysis (JRA)
- Prototyping
- Discussions / Meetings
- Observing the enterprise in operation
- Research
- Questionnaire
From all these methods we need to:
Identify essential “real world” information
Remove redundant, unimportant details
Clarify unclear natural language statements
Fill remaining gaps in discussions
Distinguish data and operations
Go to the next phase to give the model for the existing data.

Conceptual Database Design:

The requirement analysis is modeled in this conceptual design. Conceptual database design involves modeling the collected information at a high-level of abstraction. The ER diagram is used to represent this conceptual design. ER diagram consists of Entities, Attributes and Relationships.

Allow easy communication between end-users and developers.
Has a clear method to convert from high-level model to relational model.
Conceptual schema is a permanent description of the database requirements
What should be the outcome from this phase:
Entities and relationships in the enterprise
Information about these entities and relationships that we need to store in the database
Integrity constraints or business rules that hold
Type of relationship or Cardinality (1:1, 1:N, N:M) should be defined
At the end of this phase the database `schema’ in the ER Model can be represented pictorially (ER diagrams).

Logical Schema Design:

Once the relationships and dependencies are identified the data can be arranged into logical structures and is mapped into database management system tables. Normalization is performed to make the relations in appropriate normal forms.

Map all entities, attributes and relationships from ER diagrams to relational database objects
Map regular entities
Map weak entities
Map binary relationships
Map associative entities
Map unary relationships
Map ternary relationships
Map supertype/subtype relationships
Find out the anomalies
Insertion: Data about an attribute inserted at more than one place (Courseid, sid, sname)
Deletion: Removing data of an attribute required delete operation at more than one place
Update: Updating data of an attribute required Update operation at more than one place
Identify the candidate\primary keys
Normalize all the relations in database be following the normal forms
First normal form (No Multivalued Dependency)
Second normal form (No Partial Dependency)
Third normal form (No Transitive Dependency)
Boyce-Codd normal form
Fourth Normal form
Fifth Normal form
Apply all constraints from ER diagrams to make sure that the database is integrated
Domain integrity – Allowable values to a attribute in the domain
Entity integrity – No primary key attribute may be null
Referential integrity – Data must be consistent (Primary foreign key matching) and enforcement on Delete (Cascade)

Physical Schema Design:

It deals with the physical implementation of the database in a database management system. It includes the specification of data elements, data types, indexing etc. All these information is stored in the data dictionary.

Information needed for physical file and database design includes:
Normalized relations plus size estimates for them
Definitions of each attribute
Descriptions of where and when data are used (entered, retrieved, deleted, updated, and how often
Expectations and requirements for response time, and data security, backup, recovery, retention and integrity
Descriptions of the technologies used to implement the database

Things that should be considered in the physical schema design

Usage Type: OLTP, OLAP, Production, Dev, Test, etc
Data Storage: Choose the appropriate data type. Main Memory and Secondary Memory
Database Size (Data File and Log File growth expecting): Size of Relations -> tuples
Processing Time: Processors using and speed according to the expected growth
Operating System: OS capacity
Access Methods (Physical Sequential, Indexed Sequential, Indexed Random, Inverted, Direct, Hashed)
CRUD Matrix (Database Usage – Create, Retrieve, Update and Delete)
Security – Security considerations while storing data files
Disaster Recovery – Recovery planning according to the business
Indexing: Index primary, foreign keys, attributes that uses in filters and in sorts. Do not index attribute that are having few values.
Cashing
De-normalizing

Normalization

Normalization is a method of arranging the data elements in an organized format to optimize the data access. Normalization Avoids:

Duplication of Data
Insert Anomaly
Delete Anomaly
Update Anomaly

Normal Forms:

There are 5+ normalization rules (5 Normal Forms and Boyce-Codd), but most day-to-day database creation focuses on the first three. These rules are used to help create flexible, adaptable tables that have no redundancy and are easy to query. Before normalizing begin with a list of all of the fields that must appear in the database. Think of this as one big table.

First Normal Form:

Remove repeating groups by moving those groups into new tables.
Example1: Divide Name column into “First_Name” and “Last_Name”
Example2: One field called “Month” instead of 12 columns called “Jan”, “Feb” etc.

Second Normal Form:

Remove Partial Dependencies.
Partial Dependency A type of functional dependency where an attribute is functionally dependent on only part of the primary key (primary key must be a composite key). A column in a table is partially dependent on a primary key.
Ex: Below table is having a composite primary key on (StudentID and CourseID) and it is not in second normal form because the column StudentName is partially dependent on Primary Key (Only on StudentID)

StudentID, CourseID, StudentName,Grade

Remove partial dependency by dividing into two different tables.

StudentID, CourseID, Grade

and

StudentID, StudentName

Now above tables are in 2^nd normal form.

Third Normal Form:

Remove transitive dependencies. (C B A. Indirectly C A not directly)
Transitive Dependency is a type of functional dependency where an attribute is functionally dependent on an attribute other than the primary key. Thus its value is only indirectly determined by the primary key.
Ex: In below table CourseID is primary key. But the column FacultyOffice is not directly depends on primary key and it depends on a non-primary key column “FacultyID”.

FacultyOffice FacultyID CourseID

CourseID, Section, FacultyID, FacultyOffice

Hence divide the table to remove the transitive dependency.

Types of Relationships

There are three types of table relationships. Each has its own unique purpose in helping to organize the data within the database. These relationships can be used to determine joins in queries as well.

1) One-to-One Relationships: one record in a table is related to one record in a related table; creates equally dependent tables

Ex. one student has only one PSU ID

*NOTE: This type of relationship is rarely used.

2) One-to-Many Relationships: one record in a primary table is related to many records in a related table; however, a record in the related table has only one related record in the primary table

Ex. a student can live in one residence hall at a given time, but many students can live in a residence hall at a given time

*NOTE: This is the most common type of relationship.

3) Many-to-Many Relationships: several records in the primary table are related to several records in a related table

Ex. one student can be enrolled in many subjects and each subject can have many students enrolled

Data / Database – Models

Data Modeling:

Data modeling is a method used to define and analyze data requirements needed to support the business processes of an organization and by defining the data structures and the relationships between data elements. There are three types:

Semantic /Conceptual Model
Logical Model
Physical Model

Database Models:

Databases appeared in the late 1960s, at a time when the need for a flexible information management system had arisen. There are five models of DBMS, which are distinguished based on how they represent the data contained:

Hierarchical
Network (CODASYL)
Object/Relational
Object-Oriented
Multi-dimensional
Semi Structured
Associative
Entity-Attribute-Value (EAV)
Context

Signs of Good Database Design:

Thoughtfully planned
Works for the intended situation and purpose
Streamlines a process
Shows consistency among data (Fall 05 vs. fall 2005)
Eliminates redundancy as much as possible, i.e. tables are normalized
Provides users with an easy way to enter, store, and retrieve data
Does NOT promote deleting data, but rather making designations or archiving it
Provides unique identifiers (primary keys) for records
Grows, changes, and improves as needed

Database Integrity

Enforcing data integrity ensures the quality of the data in the database. For example, if an employee is entered with an employee_id value of 123, the database should not allow another employee to have an ID with the same value. Data integrity falls into these categories:

Entity integrity
Domain integrity
Referential integrity
User-defined integrity

Entity Integrity:

The intention of entity integrity is to uniquely identify all the rows in a table. For this we need to add primary key to a column.

Domain Integrity:

A domain defines the possible values of an attribute. Domain Integrity rules govern these values. In a database system, the domain integrity is defined by:

The data type and the length
The NULL value acceptance
The allowable values, through techniques like constraints or rules
The default value We can use the check constraint to restrict the column values

Referential Integrity:

Referential integrity is a database concept that ensures that relationships between tables remain consistent. When one table has a foreign key to another table, the concept of referential integrity states that you may not add a record to the table that contains the foreign key unless there is a corresponding record in the linked table. It also includes the techniques known as cascading update and cascading delete, which ensure that changes made to the linked table are reflected in the primary table. We can implement this by using foreign key constraints.

User-Defined:

Integrity User-defined integrity allows you to define specific business rules that do not fall into one of the other integrity categories. All of the integrity categories support user-defined integrity (all column- and table-level constraints in CREATE TABLE, stored procedures, and triggers).

References:

Five common errors in requirements analysis (and how to avoid them)

https://msdn.microsoft.com/en-us/library/cc505843.aspx

What is Database Design Methodology? Different Phases of Design Methodology.

http://www.jkinfoline.com/steps-in-database-design.html

https://msdn.microsoft.com/en-us/library/b42dwsa3(v=vs.80).aspx

http://docs.oracle.com/cd/A87860_01/doc/server.817/a76994/logical.htm

8 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

suresh

10 years ago

Thanks Uday..very useful info

Author

uday arumilli

Reply to suresh

Thanks Suresh,

Happy Reading
http://udayarumilli.com/

Pearl

Sorry, i’m nt getting how to make use of ur guidence.

Reply to Pearl

YHi,

Thanks for visiting our blog, database design concepts are for the people who works in database systems. Let us know if you are looking into specific concepts.

SQL DBA Interview Preparation

8 years ago

[…] Database Design […]

SureshP

7 years ago

Hi Uday,

Thanks for the useful information. Please share if you have design document, it will very helpful for me to design database…

Reply to SureshP

I’ll surely post the design document.

Harivamshi

5 years ago

Searched …searched …SQL interview questions atlast i got your PDF interview guide that was really awesome and also your experience in Microsoft was just amazing nice info from you hope u always guide share some knowledge o he people who are excited to learn new things..