SQL Server Production Issues

SQL Server Production Issues

Prod Problems and Solutions:

     Below are the common problems from production environment.

  •  SQL Server Log full alert 
  • SQL Server Blocking Alert
  • Low Disk Space/Disk Full
      Now here are the solutions for these issues…………….

 

 SQL Server Log full alert
________________________________________

Title

SQL Server log full

 Issue/Problem Description

Log is full on TempDB or a user database. This is most often caused by a process has run longer than expected, has hung or performed a rollback because there was insufficient space in the database to proceed.

Action/Resolution Information

For User Databases

1. Verify drive space is not causing the problem.
2. Check SQL Agent jobs to see is one is causing the problem.

3. Locate running process verify if:
a. Are still running
b. Have completed already.
c. Are rolling back or have rolled back.

4. Check DB settings for caps on the files.
a. If a max size is set, expand the max size by a small amount to allow the process to finish running. Make sure that other files on the drive are not impacted.
b. If a max is not set and the drive is running out of space notify the customer of the findings and have them consider the following:
** Expand to another location. (Make sure to cap the file as well before expanding to
another location to prevent the file from taking all of the disk space left.)
** Grow the drive if it is on Storage Utility
5. If the following hold true attempt to shrink the log:
a. The process has completed or rolled back
b. There is free space on the Disk
c. There is free space to be gained in the log

For TempDB

1. Identify if TempDB is setup with multiple files.

2. Verify DB settings.

a. If there is a maxcap on the tempdb file and there is adequate, disk space increase it up a small amount to allow the process to complete.

IMPORTANT – If this TempDB is set up with several TempDB files you need to add a little space to each file equally to maintain the correct algorithm.

b. If there’s no cap set on the database.

** Verify drive space If there is a lot of space available than the process may have

already rolled back.

** Identify process using tempdb and what is driving them.

** Locate alternate location to expand files

(Use extreme caution when expanding to another drive. It will affect the other databases on the server. Try to expand only long enough to get the process completed and only to a drive other than E-H.)

As a last resort and only when SQL is negatively impacted by the condition consider requesting permission from the customer to restart SQL.

 

Blocking Alert
________________________________________

Title

SQL Server Blocking

Issue/Problem Description

On occasion SQL server will throw alerts for blockage. A lot of times this blockage will pass on its own. In those cases let the customer know the blockage occurred and that it has passed. If it continues however this could indicate a larger problem.

Action/Resolution Information

Blocking often occurs because:

1. A SPID holds locks on a set of resources for an extended period of time before releasing them. This type of blocking resolves itself over time, but can cause performance degradation.(This type will often resolve itself however can cause a moving target of sorts as the blocking moves from one resource to another making this harder to troubleshoot overall)

2. A SPID holds locks on a set of resources and never releases them. This type of blocking does not resolve itself and prevents access to the affected resources indefinitely.(This type is fairly consistent and easy to troubleshoot normally)

Running SQLStat will verify blockage is still occurring after you receive the alert:

CONNECTING TO SQL SERVER:

SERVER_NAME 9.00.1355

TOTAL PROCESSES: 26

BLOCKED PROCESSES: 2

Once you have verified that the blockage is still occurring you need to identify the following:

1. The SPID causing the blocking.

2. The command being executed by the SPID.

3. If the SPID is still active.

4. What process is controlling the SPID.

 

You can either view things using Query Analyzer or Enterprise Manager. Since some tempDB blocking can prevent you from viewing the process’ through Enterprise Manager it is good to be aware of which query to run to get the same information.

 

 

From Query Analyzer you can run any of the following to get SPID information:

 

 

** SP_WHO2 – BlkBy column

** SP_LOCK – SPID Column

** SELECT * FROM MASTER..SYSPROCESSES

** SELECT * FROM MASTER..SYSLOCKINFO

** SELECT * FROM SYS.DM_TRAN_LOCKS – SQL 2005 only-

Use WHERE resource_database_id = clause for specific database.

 

 

A SPID value of -2 indicates an orphaned transaction.

 

 

Take these blockers and run the following to get the detail of what they are running:

 

 

DBCC INPUTBUFFER(spid)

 

 

Identify I/O consumption on long running tasks to verify still active:

 

 

SELECT SPID, PHYSICAL_IO FROM MASTER..SYSPROCESSES

 

 

OR

 

 

Connect via Enterprise manager and view the following

1. Expand Group

2. Choose Server

3. Expand Management

For SQL 2000

** Expand Locks/Process ID

** Click Process Info

** Double Click the blocking Spid to capture the last command executed.

For SQL 2005

•Double click on Activity Monitor

** Scroll to the right and locate the “Blocked By” columns. These are your head

blockers.

** Double click on Process ID to make note of what activity is causing the blocking.

** Note Process ID number for the head of the blocking chain.

 

 

Once you have identified the running command and verified it is still active look at some of the following:

 

 

If a SQL Agent job is spawning the command to run.

 

 

1. If the job is running over, notify customer of situation and to check

2. If the job is not running over, notify customer and monitor till expected job end

3. If this is a daily occurrence you may suggest they choose a different time to run their job when the server is under fewer loads and it will not cause blocking.

 

 

Not a SQL job but ad-hoc query – Notify customer and person running query of the problem and request they

 

 

1. Break any large, complex queries into simpler queries

2. Run the query during off hours or on a separate computer

 

 

Remember that certain SPIDs should not be killed or cannot be killed.

 

 

**Your own process – Cannot be killed

 

 

Those labeled with the following should not be killed:

 

 

** AWAITING COMMAND

** CHECKPOINT SLEEP

** LAZY WRITER

** LOCK MONITOR

** SIGNAL HANDLER

 

 

If blocking is persistent and an attempt to kill the SPID has left it hung you may need to either:

1. Restart SQL

or

2. Reboot Server

 

 

Run the following from a Query Analyzer to get an idea of how long a SPID may take to rollback:

 

 

KILL spid WITH STATUSONLY

 

 

In order to clear the issue. Make sure and discuss this with the customer first.

 

 

 

 

Disk Space Alert

________________________________________

 

 

Title

 

 

Low Disk Space/Disk Full

 

 

Issue/Problem Description

 

 

Disk threshold alerts

 

 

There are two disk threshold alerts we are currently trapping:

1. Below 400mb (Sends email to MSE every 15 minutes)

2. Below 5% (Sends email to MSE every 15 minutes)

 

 

Drive Notations as Per  General Production Standards

 

 

C Drive – All OS Files present.

D Drive – All the System Database files (log & data files) present in D:\MSSQL\Data folder.

E Drive – All the Database Backup files (Full & Differential) present in E:\MSSQL\BAK folder.

F Drive – All the Transaction Log Backup files present in F:\MSSQL\TRAN folder.

T Drive – All the data files of Tempdb database present in T:\MSSQL\DATA

H Drive – All the database files (only Data) present in H:\MSSQL\DATA.

O Drive – All the Log files present in O:\MSSQL\DATA. .

 

 

Action/Resolution Information

 

 

Identify the server type

 

 

Since we monitor several different types of servers, each with its own SKU, it is important to identify the type of server throwing the error to make sure you understand what each drive’s purpose is.

 

 

Common:

 

 

** File servers

** SQL Servers

** IIS Servers

** Servers on SANs or SU’s

 

 

Common causes

 

 

** Look for the obvious files first:

1. *.tmp

2. *.dmp

3. *.log

** Look for failed jobs within SQL or Task Scheduler that may be causing the drive to fill.

** Check the Print Spooler if this server does a lot of printing. (A large print job that hung can cause the C$ drive to fill quickly. Commonly this folder is located

C:\$systemroot$\system32\spool\PRINTERS.)

** Check for SMS Cache that has grown large.
Actions you can consider

For C Drive alerts:

** Contact DCOps for resolution.

For other drives:

**Contact the customer and ask them to remove any unnecessary files/folders.

** Compressing old files and/or the folder.

** Special Tape archive and delete.

a. Create a share where the data is located.

b. Create a Change Request for the Backup team to backup the newly created share to

tape and send offsite.

c. Verify successful tape save with Backups.

d. Delete data off server.

** Shrinking the files if they are from a SQL database.

a. Identify if there is available space on the database to shrink it down a bit.

** Delete completely. (Mostly refers to C$)

a. Old dump files from Windows and DrWatson.

b. Old Install bits from updates and upgrades.

** Move Print Spool to another drive.

** If this is on a Storage Utility, you may consider adding more drive space.

 

 

!! Always confirm!!

 

 

Always confirm with the customer before moving, removing or altering ANY of their data from a server.

 

Will be back with More probs and Sols…………………..!!!!!.

Posted in SQL Server DBA | Tagged , , , , , , , , , , | 2 Comments

Leave a Reply

2 Comments on "SQL Server Production Issues"

Notify of
avatar
Sort by:   newest | oldest | most voted
Rachel
Guest
RE: Disk space issues Why is it that you have both the log files and data files in the same drive (D)? Dont you think that itself could be a problem for disk space issues ? If the databases are small and we are sure that their growth is going to be insignificant then we can let go otherwise there lies your problem of disk contention. Apart from these I think your article is a really good one for beginners and mid-level dbas. I havent found that many articles pointing towards a step-by-step information on production troubleshooting. Wish to see… Read more »
Uday Chowdary Arumilli
Guest

Hi Rachel, Sorry for the late reply.

By the way I forgot to mention “system” instead I have mentioned that all data files. All system data files has to be located on D drive.

Thanks for your valuable time

wpDiscuz