Database Design Concepts
People who work in database systems should have some knowledge in database design concepts. This article let you understand the high level process in designing a new relational database system. These concepts will help you in designing, developing, administrating and maintaining database systems. A SQL Developer / DBA with 5+ years of experience will be asked a common question in any interview is “What are the phases in database Design / Any Idea what are the steps needs to be taken to design a secure , scalable and optimized database system? / How to design a database?”. Here is the answer.
Database Design Concepts
Here we are going to learn below concepts that helps you in designing a new database.
- Database Design Phases
- Types of Relationships
- Database Models
- Database Integrity
Database Design Phases
- Requirement Specification and Analysis
- Conceptual\Semantic Database Design
- Implementation\Logical Schema Design
- Physical Schema Design
- Optimization \ Administration
Requirement Specification and Analysis:
- In this phase a detailed analysis of the requirement is done. The objective of this phase is to get a clear understanding of the requirements.
- The database designer’s first step is to draw up a data requirements document. The requirements document contains a concise and non-technical summary of what data items will be stored in the database, and how the various data items relate to one another.
- Taking the ‘data requirements document’, further analysis is done to give meaning to the data items, e.g. define the more detailed attributes of the data and define constraints if needed. The result of this analysis is a ‘preliminary specifications’ document.
- Some of the information gathering methods are:
- Analyzing documents and existing system or data
- Site visit
- Joint Applications Design (JAD) and Joint Requirements Analysis (JRA)
- Discussions / Meetings
- Observing the enterprise in operation
- From all these methods we need to:
- Identify essential “real world” information
- Remove redundant, unimportant details
- Clarify unclear natural language statements
- Fill remaining gaps in discussions
- Distinguish data and operations
- Go to the next phase to give the model for the existing data.
Conceptual Database Design:
The requirement analysis is modeled in this conceptual design. Conceptual database design involves modeling the collected information at a high-level of abstraction. The ER diagram is used to represent this conceptual design. ER diagram consists of Entities, Attributes and Relationships.
- Allow easy communication between end-users and developers.
- Has a clear method to convert from high-level model to relational model.
- Conceptual schema is a permanent description of the database requirements
- What should be the outcome from this phase:
- Entities and relationships in the enterprise
- Information about these entities and relationships that we need to store in the database
- Integrity constraints or business rules that hold
- Type of relationship or Cardinality (1:1, 1:N, N:M) should be defined
- At the end of this phase the database `schema’ in the ER Model can be represented pictorially (ER diagrams).
Logical Schema Design:
Once the relationships and dependencies are identified the data can be arranged into logical structures and is mapped into database management system tables. Normalization is performed to make the relations in appropriate normal forms.
- Map all entities, attributes and relationships from ER diagrams to relational database objects
- Map regular entities
- Map weak entities
- Map binary relationships
- Map associative entities
- Map unary relationships
- Map ternary relationships
- Map supertype/subtype relationships
- Find out the anomalies
- Insertion: Data about an attribute inserted at more than one place (Courseid, sid, sname)
- Deletion: Removing data of an attribute required delete operation at more than one place
- Update: Updating data of an attribute required Update operation at more than one place
- Identify the candidate\primary keys
- Normalize all the relations in database be following the normal forms
- First normal form (No Multivalued Dependency)
- Second normal form (No Partial Dependency)
- Third normal form (No Transitive Dependency)
- Boyce-Codd normal form
- Fourth Normal form
- Fifth Normal form
- Apply all constraints from ER diagrams to make sure that the database is integrated
- Domain integrity – Allowable values to a attribute in the domain
- Entity integrity – No primary key attribute may be null
- Referential integrity – Data must be consistent (Primary foreign key matching) and enforcement on Delete (Cascade)
Physical Schema Design:
It deals with the physical implementation of the database in a database management system. It includes the specification of data elements, data types, indexing etc. All these information is stored in the data dictionary.
- Information needed for physical file and database design includes:
- Normalized relations plus size estimates for them
- Definitions of each attribute
- Descriptions of where and when data are used (entered, retrieved, deleted, updated, and how often
- Expectations and requirements for response time, and data security, backup, recovery, retention and integrity
- Descriptions of the technologies used to implement the database
Things that should be considered in the physical schema design
- Usage Type: OLTP, OLAP, Production, Dev, Test, etc
- Data Storage: Choose the appropriate data type. Main Memory and Secondary Memory
- Database Size (Data File and Log File growth expecting): Size of Relations -> tuples
- Processing Time: Processors using and speed according to the expected growth
- Operating System: OS capacity
- Access Methods (Physical Sequential, Indexed Sequential, Indexed Random, Inverted, Direct, Hashed)
- CRUD Matrix (Database Usage – Create, Retrieve, Update and Delete)
- Security – Security considerations while storing data files
- Disaster Recovery – Recovery planning according to the business
- Indexing: Index primary, foreign keys, attributes that uses in filters and in sorts. Do not index attribute that are having few values.
Normalization is a method of arranging the data elements in an organized format to optimize the data access. Normalization Avoids:
- Duplication of Data
- Insert Anomaly
- Delete Anomaly
- Update Anomaly
There are 5+ normalization rules (5 Normal Forms and Boyce-Codd), but most day-to-day database creation focuses on the first three. These rules are used to help create flexible, adaptable tables that have no redundancy and are easy to query. Before normalizing begin with a list of all of the fields that must appear in the database. Think of this as one big table.
First Normal Form:
- Remove repeating groups by moving those groups into new tables.
- Example1: Divide Name column into “First_Name” and “Last_Name”
- Example2: One field called “Month” instead of 12 columns called “Jan”, “Feb” etc.
Second Normal Form:
- Remove Partial Dependencies.
- Partial Dependency A type of functional dependency where an attribute is functionally dependent on only part of the primary key (primary key must be a composite key). A column in a table is partially dependent on a primary key.
- Ex: Below table is having a composite primary key on (StudentID and CourseID) and it is not in second normal form because the column StudentName is partially dependent on Primary Key (Only on StudentID)
StudentID, CourseID, StudentName,Grade
Remove partial dependency by dividing into two different tables.
StudentID, CourseID, Grade
Now above tables are in 2nd normal form.
Third Normal Form:
- Remove transitive dependencies. (C B A. Indirectly C A not directly)
- Transitive Dependency is a type of functional dependency where an attribute is functionally dependent on an attribute other than the primary key. Thus its value is only indirectly determined by the primary key.
- Ex: In below table CourseID is primary key. But the column FacultyOffice is not directly depends on primary key and it depends on a non-primary key column “FacultyID”.
FacultyOffice FacultyID CourseID
CourseID, Section, FacultyID, FacultyOffice
Hence divide the table to remove the transitive dependency.
Types of Relationships
There are three types of table relationships. Each has its own unique purpose in helping to organize the data within the database. These relationships can be used to determine joins in queries as well.
1) One-to-One Relationships: one record in a table is related to one record in a related table; creates equally dependent tables
Ex. one student has only one PSU ID
*NOTE: This type of relationship is rarely used.
2) One-to-Many Relationships: one record in a primary table is related to many records in a related table; however, a record in the related table has only one related record in the primary table
Ex. a student can live in one residence hall at a given time, but many students can live in a residence hall at a given time
*NOTE: This is the most common type of relationship.
3) Many-to-Many Relationships: several records in the primary table are related to several records in a related table
Ex. one student can be enrolled in many subjects and each subject can have many students enrolled
Data / Database – Models
Data modeling is a method used to define and analyze data requirements needed to support the business processes of an organization and by defining the data structures and the relationships between data elements. There are three types:
- Semantic /Conceptual Model
- Logical Model
- Physical Model
Databases appeared in the late 1960s, at a time when the need for a flexible information management system had arisen. There are five models of DBMS, which are distinguished based on how they represent the data contained:
- Network (CODASYL)
- Semi Structured
- Entity-Attribute-Value (EAV)
Signs of Good Database Design:
- Thoughtfully planned
- Works for the intended situation and purpose
- Streamlines a process
- Shows consistency among data (Fall 05 vs. fall 2005)
- Eliminates redundancy as much as possible, i.e. tables are normalized
- Provides users with an easy way to enter, store, and retrieve data
- Does NOT promote deleting data, but rather making designations or archiving it
- Provides unique identifiers (primary keys) for records
- Grows, changes, and improves as needed
Enforcing data integrity ensures the quality of the data in the database. For example, if an employee is entered with an employee_id value of 123, the database should not allow another employee to have an ID with the same value. Data integrity falls into these categories:
- Entity integrity
- Domain integrity
- Referential integrity
- User-defined integrity
The intention of entity integrity is to uniquely identify all the rows in a table. For this we need to add primary key to a column.
A domain defines the possible values of an attribute. Domain Integrity rules govern these values. In a database system, the domain integrity is defined by:
- The data type and the length
- The NULL value acceptance
- The allowable values, through techniques like constraints or rules
- The default value We can use the check constraint to restrict the column values
Referential integrity is a database concept that ensures that relationships between tables remain consistent. When one table has a foreign key to another table, the concept of referential integrity states that you may not add a record to the table that contains the foreign key unless there is a corresponding record in the linked table. It also includes the techniques known as cascading update and cascading delete, which ensure that changes made to the linked table are reflected in the primary table. We can implement this by using foreign key constraints.
Integrity User-defined integrity allows you to define specific business rules that do not fall into one of the other integrity categories. All of the integrity categories support user-defined integrity (all column- and table-level constraints in CREATE TABLE, stored procedures, and triggers).