Database Design Concepts

Posted on October 24, 2015 by uday arumilli

Database Design Concepts

People who work in database systems should have some knowledge in database design concepts. This article let you understand the high level process in designing a new relational database system. These concepts will help you in designing, developing, administrating and maintaining database systems. A SQL Developer / DBA with 5+ years of experience will be asked a common question in any interview is “What are the phases in database Design / Any Idea what are the steps needs to be taken to design a secure , scalable and optimized database system? / How to design a database?”. Here is the answer.

Database Design Concepts

Here we are going to learn below concepts that helps you in designing a new database.

Database Design Phases
Normalization
Types of Relationships
Database Models
Database Integrity

Database Design Phases

Requirement Specification and Analysis
Conceptual\Semantic Database Design
Implementation\Logical Schema Design
Physical Schema Design
Optimization \ Administration

Requirement Specification and Analysis:

In this phase a detailed analysis of the requirement is done. The objective of this phase is to get a clear understanding of the requirements.
The database designer’s first step is to draw up a data requirements document. The requirements document contains a concise and non-technical summary of what data items will be stored in the database, and how the various data items relate to one another.
Taking the ‘data requirements document’, further analysis is done to give meaning to the data items, e.g. define the more detailed attributes of the data and define constraints if needed. The result of this analysis is a ‘preliminary specifications’ document.
Some of the information gathering methods are:
- Interview
- Analyzing documents and existing system or data
- Survey
- Site visit
- Joint Applications Design (JAD) and Joint Requirements Analysis (JRA)
- Prototyping
- Discussions / Meetings
- Observing the enterprise in operation
- Research
- Questionnaire
From all these methods we need to:
Identify essential “real world” information
Remove redundant, unimportant details
Clarify unclear natural language statements
Fill remaining gaps in discussions
Distinguish data and operations
Go to the next phase to give the model for the existing data.

Conceptual Database Design:

The requirement analysis is modeled in this conceptual design. Conceptual database design involves modeling the collected information at a high-level of abstraction. The ER diagram is used to represent this conceptual design. ER diagram consists of Entities, Attributes and Relationships.

Allow easy communication between end-users and developers.
Has a clear method to convert from high-level model to relational model.
Conceptual schema is a permanent description of the database requirements
What should be the outcome from this phase:
Entities and relationships in the enterprise
Information about these entities and relationships that we need to store in the database
Integrity constraints or business rules that hold
Type of relationship or Cardinality (1:1, 1:N, N:M) should be defined
At the end of this phase the database `schema’ in the ER Model can be represented pictorially (ER diagrams).

Logical Schema Design:

Once the relationships and dependencies are identified the data can be arranged into logical structures and is mapped into database management system tables. Normalization is performed to make the relations in appropriate normal forms.

Map all entities, attributes and relationships from ER diagrams to relational database objects
Map regular entities
Map weak entities
Map binary relationships
Map associative entities
Map unary relationships
Map ternary relationships
Map supertype/subtype relationships
Find out the anomalies
Insertion: Data about an attribute inserted at more than one place (Courseid, sid, sname)
Deletion: Removing data of an attribute required delete operation at more than one place
Update: Updating data of an attribute required Update operation at more than one place
Identify the candidate\primary keys
Normalize all the relations in database be following the normal forms
First normal form (No Multivalued Dependency)
Second normal form (No Partial Dependency)
Third normal form (No Transitive Dependency)
Boyce-Codd normal form
Fourth Normal form
Fifth Normal form
Apply all constraints from ER diagrams to make sure that the database is integrated
Domain integrity – Allowable values to a attribute in the domain
Entity integrity – No primary key attribute may be null
Referential integrity – Data must be consistent (Primary foreign key matching) and enforcement on Delete (Cascade)

Physical Schema Design:

It deals with the physical implementation of the database in a database management system. It includes the specification of data elements, data types, indexing etc. All these information is stored in the data dictionary.

Information needed for physical file and database design includes:
Normalized relations plus size estimates for them
Definitions of each attribute
Descriptions of where and when data are used (entered, retrieved, deleted, updated, and how often
Expectations and requirements for response time, and data security, backup, recovery, retention and integrity
Descriptions of the technologies used to implement the database

Things that should be considered in the physical schema design

Usage Type: OLTP, OLAP, Production, Dev, Test, etc
Data Storage: Choose the appropriate data type. Main Memory and Secondary Memory
Database Size (Data File and Log File growth expecting): Size of Relations -> tuples
Processing Time: Processors using and speed according to the expected growth
Operating System: OS capacity
Access Methods (Physical Sequential, Indexed Sequential, Indexed Random, Inverted, Direct, Hashed)
CRUD Matrix (Database Usage – Create, Retrieve, Update and Delete)
Security – Security considerations while storing data files
Disaster Recovery – Recovery planning according to the business
Indexing: Index primary, foreign keys, attributes that uses in filters and in sorts. Do not index attribute that are having few values.
Cashing
De-normalizing

Normalization

Normalization is a method of arranging the data elements in an organized format to optimize the data access. Normalization Avoids:

Duplication of Data
Insert Anomaly
Delete Anomaly
Update Anomaly

Normal Forms:

There are 5+ normalization rules (5 Normal Forms and Boyce-Codd), but most day-to-day database creation focuses on the first three. These rules are used to help create flexible, adaptable tables that have no redundancy and are easy to query. Before normalizing begin with a list of all of the fields that must appear in the database. Think of this as one big table.

First Normal Form:

Remove repeating groups by moving those groups into new tables.
Example1: Divide Name column into “First_Name” and “Last_Name”
Example2: One field called “Month” instead of 12 columns called “Jan”, “Feb” etc.

Second Normal Form:

Remove Partial Dependencies.
Partial Dependency A type of functional dependency where an attribute is functionally dependent on only part of the primary key (primary key must be a composite key). A column in a table is partially dependent on a primary key.
Ex: Below table is having a composite primary key on (StudentID and CourseID) and it is not in second normal form because the column StudentName is partially dependent on Primary Key (Only on StudentID)

StudentID, CourseID, StudentName,Grade

Remove partial dependency by dividing into two different tables.

StudentID, CourseID, Grade

and

StudentID, StudentName

Now above tables are in 2^nd normal form.

Third Normal Form:

Remove transitive dependencies. (C B A. Indirectly C A not directly)
Transitive Dependency is a type of functional dependency where an attribute is functionally dependent on an attribute other than the primary key. Thus its value is only indirectly determined by the primary key.
Ex: In below table CourseID is primary key. But the column FacultyOffice is not directly depends on primary key and it depends on a non-primary key column “FacultyID”.

FacultyOffice FacultyID CourseID

CourseID, Section, FacultyID, FacultyOffice

Hence divide the table to remove the transitive dependency.

Types of Relationships

There are three types of table relationships. Each has its own unique purpose in helping to organize the data within the database. These relationships can be used to determine joins in queries as well.

1) One-to-One Relationships: one record in a table is related to one record in a related table; creates equally dependent tables

Ex. one student has only one PSU ID

*NOTE: This type of relationship is rarely used.

2) One-to-Many Relationships: one record in a primary table is related to many records in a related table; however, a record in the related table has only one related record in the primary table

Ex. a student can live in one residence hall at a given time, but many students can live in a residence hall at a given time

*NOTE: This is the most common type of relationship.

3) Many-to-Many Relationships: several records in the primary table are related to several records in a related table

Ex. one student can be enrolled in many subjects and each subject can have many students enrolled

Data / Database – Models

Data Modeling:

Data modeling is a method used to define and analyze data requirements needed to support the business processes of an organization and by defining the data structures and the relationships between data elements. There are three types:

Semantic /Conceptual Model
Logical Model
Physical Model

Database Models:

Databases appeared in the late 1960s, at a time when the need for a flexible information management system had arisen. There are five models of DBMS, which are distinguished based on how they represent the data contained:

Hierarchical
Network (CODASYL)
Object/Relational
Object-Oriented
Multi-dimensional
Semi Structured
Associative
Entity-Attribute-Value (EAV)
Context

Signs of Good Database Design:

Thoughtfully planned
Works for the intended situation and purpose
Streamlines a process
Shows consistency among data (Fall 05 vs. fall 2005)
Eliminates redundancy as much as possible, i.e. tables are normalized
Provides users with an easy way to enter, store, and retrieve data
Does NOT promote deleting data, but rather making designations or archiving it
Provides unique identifiers (primary keys) for records
Grows, changes, and improves as needed

Database Integrity

Enforcing data integrity ensures the quality of the data in the database. For example, if an employee is entered with an employee_id value of 123, the database should not allow another employee to have an ID with the same value. Data integrity falls into these categories:

Entity integrity
Domain integrity
Referential integrity
User-defined integrity

Entity Integrity:

The intention of entity integrity is to uniquely identify all the rows in a table. For this we need to add primary key to a column.

Domain Integrity:

A domain defines the possible values of an attribute. Domain Integrity rules govern these values. In a database system, the domain integrity is defined by:

The data type and the length
The NULL value acceptance
The allowable values, through techniques like constraints or rules
The default value We can use the check constraint to restrict the column values

Referential Integrity:

Referential integrity is a database concept that ensures that relationships between tables remain consistent. When one table has a foreign key to another table, the concept of referential integrity states that you may not add a record to the table that contains the foreign key unless there is a corresponding record in the linked table. It also includes the techniques known as cascading update and cascading delete, which ensure that changes made to the linked table are reflected in the primary table. We can implement this by using foreign key constraints.

User-Defined:

Integrity User-defined integrity allows you to define specific business rules that do not fall into one of the other integrity categories. All of the integrity categories support user-defined integrity (all column- and table-level constraints in CREATE TABLE, stored procedures, and triggers).

References:

Five common errors in requirements analysis (and how to avoid them)

https://msdn.microsoft.com/en-us/library/cc505843.aspx

What is Database Design Methodology? Different Phases of Design Methodology.

http://www.jkinfoline.com/steps-in-database-design.html

https://msdn.microsoft.com/en-us/library/b42dwsa3(v=vs.80).aspx

http://docs.oracle.com/cd/A87860_01/doc/server.817/a76994/logical.htm

Posted in Database Design, Miscellaneous, SQL Development, SQL Server DBA | Tagged conceptual design, database design best practices, database design concepts, database design phases, database design tutorial, database design tutorial sql server, logical design, physical design | 8 Comments

Stored Procedure Code Review Checklist

Posted on October 16, 2015 by uday arumilli

Stored Procedure Code Review Checklist

In any enterprise environment developing a database system includes the database code review. We have to document and follow the best practices in designing and developing database code. Most of the time while interacting with the customers we may need to answer the question “What you consider in reviewing the code?” Here I am going to showcase stored procedure code review checklist that we use for reviewing or unit testing a stored procedure.

Stored Procedure Development Life Cycle:

First let’s see what are the steps followed in developing a stored procedure:

Get the requirement – all required inputs and expected outcome
Write the test cases to validate the business requirement
Design and create the code / stored procedure
Debug, Compile and Run the procedure
Cross check all test cases passed
Send it for code review
Apply suggested changes if any from code review
Deploy it to the intended environment
Document the process

Why Best Practices / Code Review?

In enterprise environment we do follow a best practices guide for developing and reviewing database code. Now we’ll see what are the advantages in following the best practices guide / reviewing the code?

Dependency:

It minimize the resource dependency as all database objects follow a specific standard. One can easily understand and deal with the code changes.

Integration:

Easily integrated with the existing environment

Optimization:

We can see the best optimized and error free code

Stored Procedure Code Review Check List

We have defined checklist in category wise. Below are the various categories in stored procedure code review check list.

General Standards
Scalability
Security
Transactions
Performance
Functionality
Environment Based Settings

General Standards: (Code Format, Naming Conventions, Datatype and Data Length, Syntax):

Always follow a template in designing stored procedure so that it can easier developer job while designing and integrating. For example each stored procedure should be defined as various blocks such as “Comments Section”, “Variable Declaration”, “Actual Body”, “Audit”, ”Exception Handling”, “Temp_Obj_Removel” and define environment sections if any required.
Check proper comments are used or not. Always describe procedure, inputs and expected output in comments section.
Check naming conventions are used properly for procedure name, variables and other internal objects.
Check all objects used inside the procedure are prefixed with the schema name and column names are referencing with table alias.
Check all table columns used / mapped are using the correct datatypes and column length.
Check if all required SET based options enabled are not.
Check if there are any temporary objects (Temporary tables, cursors etc) used, if yes make sure these objects closed / removed once their usage is done.
Make sure Errors are handling properly.
Define NULL acceptance in the procedure and code accordingly.
Lining up parameter names, data types, and default values.
Check spaces and line breaks are using properly.
Check BEGIN / END are using properly.
Check parentheses are using properly around AND / OR blocks.

Scalability:

Use fully qualified names (Ex: Instead of PROC use PROCEDURE) for a better integration.
Check if we are using any deprecated features.
Check if any other/nested dependent objects used and make sure that all objects are available in DB and all functioning properly and add them into dependent list.
Never use “SELECT *” instead use all required columns.
If there are any triggers defined on tables used inside the procedure, make sure these triggers are working as expected.

Security:

In case of any errors make sure that the complete error information is not throwing to the application instead use a centralized table to hold the error information and send a custom error message to the application.
Apply encryption procedures while dealing with sensitive information (Ex: Credit Card numbers, pass codes etc.).
If any dynamic SQL is used make sure it executes through only SP_EXECUTESQL only.
Prefer views instead of tables wherever is possible.

Document all permissions required to run the procedure.

Transactions:

If any transactions are used, check it is following ACID properties as per the business requirement.
Keep the transaction length as short as possible and do not select data within the transaction rather than select required data before starting the transaction and process it inside the transaction.
Check commit and ROLLBACK is available happening as expected, cross check when using nested stored procedures.
Avoid transactions that require user input to commit.

Performance:

Cross check we are selecting / retrieving only required data throughout the procedure, always use Column names instead of “SELECT *”.
Check the column order in where clause, we should remember it impact the index usage, change the order if required.
Avoid using functions / conversions while selecting and comparing, If result set is having 30 rows means that function is called >=30 times. let’s say “WHERE <TAB.ColName> = MONTH (@DateParam)”, we can fulfill this by creating a local variable and assigning this value to that and we can use that variable in where clause.
Cross check if we can have a better / short way to get the same outcome with fewer joins.
Always do filter data as much as we can and then apply required operations.
Have a look on aggregations if any. Always do aggregations on a possible shortest dataset. Example we have a requirement “We want to know the top selling product details on each eStore”. Now do not join Product_Orders, Product_Details and group by on e-store name by selecting max of revenue e-store wise. Instead of doing this first get the productID’s with highest income e-Store wise and then map it with Product_Details.
Check if there is any chance for bad parameter sniffing: Make sure that procedure parameters are assigning to local variables and referring these variables in queries instead of directly referring PROCEDURE parameters.
Choose the best temporary object (Temp_Table, Table_Variable, CTE and Derived_Table) based on the requirement, here we should predict the near future.
Try to use TRUNCATE instead of DELETE whenever is possible, remember we should know the difference and the impact.
Check the response / execution time and make sure it is under the benchmark.
Avoid cursors, use while loop instead.
Check for the alternatives for costly operators such as NOT LIKE.

Make sure that it returns only the required rows and columns.
Analyze cost based execution plan to make sure No Bookmark / RID Lookup, No Table/Index Scans taking more cost, No Sort – Check if we can use Order By, Check Estimated and Actual counts.
Have a look at Optimizer Overrides – Review the code to determine if index hints or NOLOCK clauses are really necessary. These hints could be beneficial in the short term, but these overrides might impact negatively when data changes happen or database migrated to new version.

Functionality:

Prepare various test cases and make sure all test cases works fine. Example prepare test case to send all possible inputs to a PROCEDURE, define the expected output and compare with the actual output.
Check Code is error free and parsing correctly and executing without any issue.
Check output result set is coming properly, number of rows, number of columns, column names, datatypes etc.

Environment Based Settings:

Document all environments based settings and follow those instructions in designing the procedure.

Ex 1: In a highly secure database environment, all procedures should call a nested Audit procedure which collects all session details and stored in an audit table.

Ex 2: In an environment before performing and bulk operation we have to get the latest lookup values in lookup tables.

Summary:

Always define your own standards and follow best practices in designing database code.
Prepare an excel sheet with the required checklist and make sure that the sql developer is filling the sheet before sending it to the code review.
Initially it might take some extra time in development phase but it simplifies the code review process and leads to the best productivity.

Posted in Database Design, Interview Q&A, Miscellaneous, Performance Tuning, SQL Development | Tagged code audit checklist, database code review checklist, How to do code review, sql code review checklist, sql server code review, sql server code review checklist, Stored Procedure Code Review Checklist | 1 Comment

Freelancing with toptal

Posted on September 15, 2015 by uday arumilli

Freelancing with toptal

One of the best Freelancing Zone TOPTAL

Have you ever worked as a freelancer? Recently I came across with a unique freelancing site. Let’s see how it is different from other freelancing sites.

Here we need not chase for clients: You need not worry about bid and all stuff, go through the interview process and join the community.
Comparatively you can opt for high billing rates (Long Term, Short Term and Hourly)
You still can work with the best clients: AXEL SPRINGER, J.P.MORGAN, KDDI America and a lot more

It says “Hire Top 3% Freelance Developers” as they follow a typical process to allow a developer into the community. Process to Join TopTal:

Language & Personality: HR will check your communication and English speaking skills
Timed Algorithm Test: Online test
Technical Screening: By sharing your screen with their senior engineers
Building a Test Application: You need to do this and should present the project.

Please have a look at here to know more about the hiring process

This test pattern might get changed based on your background and technical skills but somehow this is a 4 step process. However once you clear the interview process, you can be a part of the community. People who experienced in scripting languages and programming languages can have a try.

Freelancing can add some extra dollars to your income but it’s tiny when it compares to the knowledge that you get by working and handling a project / task independently.

By the way the “toptal engineering team” has published few SQL Server interview questions. Have a look at here for those questions and answers.

4 Essential SQL Server Interview Questions*

Posted in Miscellaneous | Tagged Freelancing, freelancing for progremmers, TOPTAL, Worlds top freelancing site | 4 Comments

Database Design Concepts

Database Design Concepts

Database Design Concepts

Database Design Phases

Requirement Specification and Analysis:

Conceptual Database Design:

Logical Schema Design:

Physical Schema Design:

Normalization

Normal Forms:

First Normal Form:

Second Normal Form:

Third Normal Form:

Types of Relationships

1) One-to-One Relationships: one record in a table is related to one record in a related table; creates equally dependent tables

2) One-to-Many Relationships: one record in a primary table is related to many records in a related table; however, a record in the related table has only one related record in the primary table

3) Many-to-Many Relationships: several records in the primary table are related to several records in a related table

Data / Database – Models

Data Modeling:

Database Models:

Signs of Good Database Design:

Database Integrity

Entity Integrity:

Domain Integrity:

Referential Integrity:

User-Defined:

References:

Stored Procedure Code Review Checklist

Stored Procedure Code Review Checklist

Stored Procedure Development Life Cycle:

Why Best Practices / Code Review?

Dependency:

Integration:

Optimization:

Stored Procedure Code Review Check List

General Standards: (Code Format, Naming Conventions, Datatype and Data Length, Syntax):

Scalability:

Security:

Transactions:

Performance:

Functionality:

Environment Based Settings:

Summary:

Freelancing with toptal

Freelancing with toptal

One of the best Freelancing Zone TOPTAL

Subscribe

Categories

Archives

Meta