Database Design Concepts

Database Design Concepts

udayarumilli_database_design_concepts

People who work in database systems should have some knowledge in database design concepts. This article let you understand the high level process in designing a new relational database system. These concepts will help you in designing, developing, administrating and maintaining database systems. A SQL Developer / DBA with 5+ years of experience will be asked a common question in any interview is “What are the phases in database Design / Any Idea what are the steps needs to be taken to design a secure , scalable and optimized database system? / How to design a database?”. Here is the answer.

Database Design Concepts

Here we are going to learn below concepts that helps you in designing a new database.

  • Database Design Phases
  • Normalization
  • Types of Relationships
  • Database Models
  • Database Integrity

Database Design Phases

  • Requirement Specification and Analysis
  • Conceptual\Semantic Database Design
  • Implementation\Logical Schema Design
  • Physical Schema Design
  • Optimization \ Administration

Requirement Specification and Analysis:

  • In this phase a detailed analysis of the requirement is done. The objective of this phase is to get a clear understanding of the requirements.
  • The database designer’s first step is to draw up a data requirements document. The requirements document contains a concise and non-technical summary of what data items will be stored in the database, and how the various data items relate to one another.
  • Taking the ‘data requirements document’, further analysis is done to give meaning to the data items, e.g. define the more detailed attributes of the data and define constraints if needed. The result of this analysis is a ‘preliminary specifications’ document.
  • Some of the information gathering methods are:
    • Interview
    • Analyzing documents and existing system or data
    • Survey
    • Site visit
    • Joint Applications Design (JAD) and Joint Requirements Analysis (JRA)
    • Prototyping
    • Discussions / Meetings
    • Observing the enterprise in operation
    • Research
    • Questionnaire
  • From all these methods we need to:
  • Identify essential “real world” information
  • Remove redundant, unimportant details
  • Clarify unclear natural language statements
  • Fill remaining gaps in discussions
  • Distinguish data and operations
  • Go to the next phase to give the model for the existing data.

Conceptual Database Design:

The requirement analysis is modeled in this conceptual design. Conceptual database design involves modeling the collected information at a high-level of abstraction. The ER diagram is used to represent this conceptual design. ER diagram consists of Entities, Attributes and Relationships.

  • Allow easy communication between end-users and developers.
  • Has a clear method to convert from high-level model to relational model.
  • Conceptual schema is a permanent description of the database requirements
  • What should be the outcome from this phase:
  • Entities and relationships in the enterprise
  • Information about these entities and relationships that we need to store in the database
  • Integrity constraints or business rules that hold
  • Type of relationship or Cardinality (1:1, 1:N, N:M) should be defined
  • At the end of this phase the database `schema’ in the ER Model can be represented pictorially (ER diagrams).

Logical Schema Design:

Once the relationships and dependencies are identified the data can be arranged into logical structures and is mapped into database management system tables. Normalization is performed to make the relations in appropriate normal forms.

  • Map all entities, attributes and relationships from ER diagrams to relational database objects
  • Map regular entities
  • Map weak entities
  • Map binary relationships
  • Map associative entities
  • Map unary relationships
  • Map ternary relationships
  • Map supertype/subtype relationships
  • Find out the anomalies
  • Insertion: Data about an attribute inserted at more than one place (Courseid, sid, sname)
  • Deletion: Removing data of an attribute required delete operation at more than one place
  • Update: Updating data of an attribute required Update operation at more than one place
  • Identify the candidate\primary keys
  • Normalize all the relations in database be following the normal forms
  • First normal form (No Multivalued Dependency)
  • Second normal form (No Partial Dependency)
  • Third normal form (No Transitive Dependency)
  • Boyce-Codd normal form
  • Fourth Normal form
  • Fifth Normal form
  • Apply all constraints from ER diagrams to make sure that the database is integrated
  • Domain integrity – Allowable values to a attribute in the domain
  • Entity integrity – No primary key attribute may be null
  • Referential integrity – Data must be consistent (Primary foreign key matching) and enforcement on Delete (Cascade)

Physical Schema Design:

It deals with the physical implementation of the database in a database management system. It includes the specification of data elements, data types, indexing etc. All these information is stored in the data dictionary.

  • Information needed for physical file and database design includes:
  • Normalized relations plus size estimates for them
  • Definitions of each attribute
  • Descriptions of where and when data are used (entered, retrieved, deleted, updated, and how often
  • Expectations and requirements for response time, and data security, backup, recovery, retention and integrity
  • Descriptions of the technologies used to implement the database

Things that should be considered in the physical schema design

  • Usage Type: OLTP, OLAP, Production, Dev, Test, etc
  • Data Storage: Choose the appropriate data type. Main Memory and Secondary Memory
  • Database Size (Data File and Log File growth expecting): Size of Relations -> tuples
  • Processing Time: Processors using and speed according to the expected growth
  • Operating System: OS capacity
  • Access Methods (Physical Sequential, Indexed Sequential, Indexed Random, Inverted, Direct, Hashed)
  • CRUD Matrix (Database Usage – Create, Retrieve, Update and Delete)
  • Security – Security considerations while storing data files
  • Disaster Recovery – Recovery planning according to the business
  • Indexing: Index primary, foreign keys, attributes that uses in filters and in sorts. Do not index attribute that are having few values.
  • Cashing
  • De-normalizing

Normalization

Normalization is a method of arranging the data elements in an organized format to optimize the data access. Normalization Avoids:

  • Duplication of Data
  • Insert Anomaly
  • Delete Anomaly
  • Update Anomaly

Normal Forms:

There are 5+ normalization rules (5 Normal Forms and Boyce-Codd), but most day-to-day database creation focuses on the first three. These rules are used to help create flexible, adaptable tables that have no redundancy and are easy to query. Before normalizing begin with a list of all of the fields that must appear in the database. Think of this as one big table.

First Normal Form:

  • Remove repeating groups by moving those groups into new tables.
  • Example1: Divide Name column into “First_Name” and “Last_Name”
  • Example2: One field called “Month” instead of 12 columns called “Jan”, “Feb” etc.

Second Normal Form:

  • Remove Partial Dependencies.
  • Partial Dependency A type of functional dependency where an attribute is functionally dependent on only part of the primary key (primary key must be a composite key). A column in a table is partially dependent on a primary key.
  • Ex: Below table is having a composite primary key on (StudentID and CourseID) and it is not in second normal form because the column StudentName is partially dependent on Primary Key (Only on StudentID)

StudentID, CourseID, StudentName,Grade

Remove partial dependency by dividing into two different tables.

StudentID, CourseID, Grade

and

StudentID, StudentName

Now above tables are in 2nd normal form.

Third Normal Form:

  • Remove transitive dependencies. (C B A. Indirectly C A not directly)
  • Transitive Dependency is a type of functional dependency where an attribute is functionally dependent on an attribute other than the primary key. Thus its value is only indirectly determined by the primary key.
  • Ex: In below table CourseID is primary key. But the column FacultyOffice is not directly depends on primary key and it depends on a non-primary key column “FacultyID”.

FacultyOffice FacultyID CourseID

CourseID, Section, FacultyID, FacultyOffice

Hence divide the table to remove the transitive dependency.

Types of Relationships

There are three types of table relationships. Each has its own unique purpose in helping to organize the data within the database. These relationships can be used to determine joins in queries as well.

1) One-to-One Relationships: one record in a table is related to one record in a related table; creates equally dependent tables

Ex. one student has only one PSU ID

*NOTE: This type of relationship is rarely used.

2) One-to-Many Relationships: one record in a primary table is related to many records in a related table; however, a record in the related table has only one related record in the primary table

Ex. a student can live in one residence hall at a given time, but many students can live in a residence hall at a given time

*NOTE: This is the most common type of relationship.

3) Many-to-Many Relationships: several records in the primary table are related to several records in a related table

Ex. one student can be enrolled in many subjects and each subject can have many students enrolled

Data / Database – Models

Data Modeling:

Data modeling is a method used to define and analyze data requirements needed to support the business processes of an organization and by defining the data structures and the relationships between data elements. There are three types:

  • Semantic /Conceptual Model
  • Logical Model
  • Physical Model

Database Models:

Databases appeared in the late 1960s, at a time when the need for a flexible information management system had arisen. There are five models of DBMS, which are distinguished based on how they represent the data contained:

  • Hierarchical
  • Network (CODASYL)
  • Object/Relational
  • Object-Oriented
  • Multi-dimensional
  • Semi Structured
  • Associative
  • Entity-Attribute-Value (EAV)
  • Context

Signs of Good Database Design:

  • Thoughtfully planned
  • Works for the intended situation and purpose
  • Streamlines a process
  • Shows consistency among data (Fall 05 vs. fall 2005)
  • Eliminates redundancy as much as possible, i.e. tables are normalized
  • Provides users with an easy way to enter, store, and retrieve data
  • Does NOT promote deleting data, but rather making designations or archiving it
  • Provides unique identifiers (primary keys) for records
  • Grows, changes, and improves as needed

Database Integrity

Enforcing data integrity ensures the quality of the data in the database. For example, if an employee is entered with an employee_id value of 123, the database should not allow another employee to have an ID with the same value. Data integrity falls into these categories:

  • Entity integrity
  • Domain integrity
  • Referential integrity
  • User-defined integrity

Entity Integrity:

The intention of entity integrity is to uniquely identify all the rows in a table. For this we need to add primary key to a column.

Domain Integrity:

A domain defines the possible values of an attribute. Domain Integrity rules govern these values. In a database system, the domain integrity is defined by:

  • The data type and the length
  • The NULL value acceptance
  • The allowable values, through techniques like constraints or rules
  • The default value We can use the check constraint to restrict the column values

Referential Integrity:

Referential integrity is a database concept that ensures that relationships between tables remain consistent. When one table has a foreign key to another table, the concept of referential integrity states that you may not add a record to the table that contains the foreign key unless there is a corresponding record in the linked table. It also includes the techniques known as cascading update and cascading delete, which ensure that changes made to the linked table are reflected in the primary table. We can implement this by using foreign key constraints.

User-Defined:

Integrity User-defined integrity allows you to define specific business rules that do not fall into one of the other integrity categories. All of the integrity categories support user-defined integrity (all column- and table-level constraints in CREATE TABLE, stored procedures, and triggers).

References:

Five common errors in requirements analysis (and how to avoid them)

https://msdn.microsoft.com/en-us/library/cc505843.aspx

What is Database Design Methodology? Different Phases of Design Methodology.

http://www.jkinfoline.com/steps-in-database-design.html

https://msdn.microsoft.com/en-us/library/b42dwsa3(v=vs.80).aspx

http://docs.oracle.com/cd/A87860_01/doc/server.817/a76994/logical.htm

Posted in Database Design, Miscellaneous, SQL Development, SQL Server DBA | Tagged , , , , , , , | 8 Comments

Stored Procedure Code Review Checklist

udayarumilli_stored_procedure_code_review

Stored Procedure Code Review Checklist

In any enterprise environment developing a database system includes the database code review. We have to document and follow the best practices in designing and developing database code. Most of the time while interacting with the customers we may need to answer the question “What you consider in reviewing the code?” Here I am going to showcase stored procedure code review checklist that we use for reviewing or unit testing a stored procedure.

Stored Procedure Development Life Cycle:

First let’s see what are the steps followed in developing a stored procedure:

  • Get the requirement – all required inputs and expected outcome
  • Write the test cases to validate the business requirement
  • Design and create the code / stored procedure
  • Debug, Compile and Run the procedure
  • Cross check all test cases passed
  • Send it for code review
  • Apply suggested changes if any from code review
  • Deploy it to the intended environment
  • Document the process

Why Best Practices / Code Review?

In enterprise environment we do follow a best practices guide for developing and reviewing database code. Now we’ll see what are the advantages in following the best practices guide / reviewing the code?

Dependency:

It minimize the resource dependency as all database objects follow a specific standard. One can easily understand and deal with the code changes.

Integration:

Easily integrated with the existing environment

Optimization:

We can see the best optimized and error free code

Stored Procedure Code Review Check List

We have defined checklist in category wise. Below are the various categories in stored procedure code review check list.

  • General Standards
  • Scalability
  • Security
  • Transactions
  • Performance
  • Functionality
  • Environment Based Settings

General Standards(Code Format, Naming Conventions, Datatype and Data Length, Syntax):

  • Always follow a template in designing stored procedure so that it can easier developer job while designing and integrating. For example each stored procedure should be defined as various blocks such as “Comments Section”, “Variable Declaration”, “Actual Body”, “Audit”, ”Exception Handling”, “Temp_Obj_Removel” and define environment sections if any required.
  • Check proper comments are used or not. Always describe procedure, inputs and expected output in comments section.
  • Check naming conventions are used properly for procedure name, variables and other internal objects.
  • Check all objects used inside the procedure are prefixed with the schema name and column names are referencing with table alias.
  • Check all table columns used / mapped are using the correct datatypes and column length.
  • Check if all required SET based options enabled are not.
  • Check if there are any temporary objects (Temporary tables, cursors etc) used, if yes make sure these objects closed / removed once their usage is done.
  • Make sure Errors are handling properly.
  • Define NULL acceptance in the procedure and code accordingly.
  • Lining up parameter names, data types, and default values.
  • Check spaces and line breaks are using properly.
  • Check BEGIN / END are using properly.
  • Check parentheses are using properly around AND / OR blocks.

Scalability:

  • Use fully qualified names (Ex: Instead of PROC use PROCEDURE) for a better integration.
  • Check if we are using any deprecated features.
  • Check if any other/nested dependent objects used and make sure that all objects are available in DB and all functioning properly and add them into dependent list.
  • Never use “SELECT *” instead use all required columns.
  • If there are any triggers defined on tables used inside the procedure, make sure these triggers are working as expected.

Security:

  • In case of any errors make sure that the complete error information is not throwing to the application instead use a centralized table to hold the error information and send a custom error message to the application.
  • Apply encryption procedures while dealing with sensitive information (Ex: Credit Card numbers, pass codes etc.).
  • If any dynamic SQL is used make sure it executes through only SP_EXECUTESQL only.
  • Prefer views instead of tables wherever is possible.
  • Document all permissions required to run the procedure.

Transactions:

  • If any transactions are used, check it is following ACID properties as per the business requirement.
  • Keep the transaction length as short as possible and do not select data within the transaction rather than select required data before starting the transaction and process it inside the transaction.
  • Check commit and ROLLBACK is available happening as expected, cross check when using nested stored procedures.
  • Avoid transactions that require user input to commit.

Performance:

  • Cross check we are selecting / retrieving only required data throughout the procedure, always use Column names instead of “SELECT *”.
  • Check the column order in where clause, we should remember it impact the index usage, change the order if required.
  • Avoid using functions / conversions while selecting and comparing, If result set is having 30 rows means that function is called >=30 times. let’s say “WHERE <TAB.ColName> = MONTH (@DateParam)”, we can fulfill this by creating a local variable and assigning this value to that and we can use that variable in where clause.
  • Cross check if we can have a better / short way to get the same outcome with fewer joins.
  • Always do filter data as much as we can and then apply required operations.
  • Have a look on aggregations if any. Always do aggregations on a possible shortest dataset. Example we have a requirement “We want to know the top selling product details on each eStore”. Now do not join Product_Orders, Product_Details and group by on e-store name by selecting max of revenue e-store wise. Instead of doing this first get the productID’s with highest income e-Store wise and then map it with Product_Details.
  • Check if there is any chance for bad parameter sniffing: Make sure that procedure parameters are assigning to local variables and referring these variables in queries instead of directly referring PROCEDURE parameters.
  • Choose the best temporary object (Temp_Table, Table_Variable, CTE and Derived_Table) based on the requirement, here we should predict the near future.
  • Try to use TRUNCATE instead of DELETE whenever is possible, remember we should know the difference and the impact.
  • Check the response / execution time and make sure it is under the benchmark.
  • Avoid cursors, use while loop instead.
  • Check for the alternatives for costly operators such as NOT LIKE.
  • Make sure that it returns only the required rows and columns.
  • Analyze cost based execution plan to make sure No Bookmark / RID Lookup, No Table/Index Scans taking more cost, No Sort – Check if we can use Order By, Check Estimated and Actual counts.
  • Have a look at Optimizer Overrides – Review the code to determine if index hints or NOLOCK clauses are really necessary. These hints could be beneficial in the short term, but these overrides might impact negatively when data changes happen or database migrated to new version.

Functionality:

  • Prepare various test cases and make sure all test cases works fine. Example prepare test case to send all possible inputs to a PROCEDURE, define the expected output and compare with the actual output.
  • Check Code is error free and parsing correctly and executing without any issue.
  • Check output result set is coming properly, number of rows, number of columns, column names, datatypes etc.

Environment Based Settings:

  • Document all environments based settings and follow those instructions in designing the procedure.

Ex 1: In a highly secure database environment, all procedures should call a nested Audit procedure which collects all session details and stored in an audit table.

Ex 2: In an environment before performing and bulk operation we have to get the latest lookup values in lookup tables.

Summary:

  • Always define your own standards and follow best practices in designing database code.
  • Prepare an excel sheet with the required checklist and make sure that the sql developer is filling the sheet before sending it to the code review.
  • Initially it might take some extra time in development phase but it simplifies the code review process and leads to the best productivity.
Posted in Database Design, Interview Q&A, Miscellaneous, Performance Tuning, SQL Development | Tagged , , , , , , | 1 Comment

Freelancing with toptal

udayarumilli_toptal_logoFreelancing with toptal

One of the best Freelancing Zone TOPTAL

Have you ever worked as a freelancer? Recently I came across with a unique freelancing site. Let’s see how it is different from other freelancing sites.

  • Here we need not chase for clients: You need not worry about bid and all stuff, go through the interview process and join the community.
  • Comparatively you can opt for high billing rates (Long Term, Short Term and Hourly)
  • You still can work with the best clients: AXEL SPRINGER, J.P.MORGAN, KDDI America and a lot more

It says “Hire Top 3% Freelance Developers” as they follow a typical process to allow a developer into the community. Process to Join TopTal:

  • Language & Personality: HR will check your communication and English speaking skills
  • Timed Algorithm Test: Online test
  • Technical Screening: By sharing your screen with their senior engineers
  • Building a Test Application: You need to do this and should present the project.

Please have a look at here to know more about the hiring process

This test pattern might get changed based on your background and technical skills but somehow this is a 4 step process. However once you clear the interview process, you can be a part of the community. People who experienced in scripting languages and programming languages can have a try.

Freelancing can add some extra dollars to your income but it’s tiny when it compares to the knowledge that you get by working and handling a project / task independently.

By the way the “toptal engineering team” has published few SQL Server interview questions. Have a look at here for those questions and answers.

4 Essential SQL Server Interview Questions*

Posted in Miscellaneous | Tagged , , , | 4 Comments