SQL Server files and File Groups
This is to understand SQL Server files and File Groups. It’s ironic that the Relational Database Management System (RDBMS) was invented to implement data structures in something other than files. The irony is that even though the RDBMS doesn’t work with independent text files, files are still at the heart of the engine. We’ll see how SQL Server files and File Groups works.
But it is files all the way down. SQL Server makes use of two types of files. Data is first stored in a transaction log, which is one type of file, and then subsequently written to the second type, the database file. When a user enters a transaction, the data inserts, updates, or deletes is sent along to the RDBMS engine, which records that line of data in the transaction log. The log is stored in a file with the extension of LDF (Log Database Format or Log Database File). Once the server has a spare moment, a process spins up to write that line onto the database, which stores the data in a file with the extension MDF (Microsoft Database Format or Microsoft Database File).
You can assign more than one file to a database, log, or both. When you create a database, the system automatically creates the first two files, one for data and another for logs. But you should often create other files for the database to use.
SQL Server files and File Groups
Reasons:
One reason that you might want to separate files is for drive use. You might simply not have enough space for the data in a specific table or group of tables to fit on one drive letter.
Another reason to add multiple files is to simplify maintenance. If your database is quite large, you might not be able to complete a backup or re-index operation within a single day. SQL Server allows you to create multiple files, group them together into a filegroup, and then perform maintenance on the filegroups individually.
Still another reason to create multiple files, or at least to set certain files onto their own set of hard drives, is for performance. By placing indexes on separate drives than data, the INSERT operations are faster for those files.
Files and filegroups in sql server 2005
introduction:
With small to medium databases backup and restore operations are straightforward. You can generate full, differential and transaction log backups and restore them in the sequence they were taken. When you have a terabyte caliber database you face a different scenario because restoring a full backup can take many hours. While the database is being restored, no user can connect to the database. This can be particularly frustrating when the majority of your data is static and only a small portion is updated daily. Fortunately SQL Server 2005 Enterprise Edition (and later editions) provides an answer to this challenge through piecemeal restore.
Previous versions of SQL Server allowed database administrators to backup individual files and file groups instead of backing up the entire database. However, before the database becomes available you must restore every file and file group. This solution allows reducing the total backup time, but doesn’t help with database availability.
Get Online Faster When Disaster Strikes
With SQL Server 2005, the database can be available for querying after its PRIMARY file group has been restored. You can continue restoring other files and file groups while users are querying the database. Only the portion of the database being restored will not be available to the users. In fact, if you don’t have the backup files immediately available or if you don’t want to impose any extra overhead on the system, you can wait as long as you want before restoring the rest of the file groups. The file groups that haven’t been restored remain in offline state.
Plan Ahead: Design Your Filegroups For Faster Restores
Things get more complicated when you want to restore a single filegroup from a backup.
Let’s say we have a 1 terabyte data warehouse. If our SQL Server goes down, we can’t restore 1 terabyte of data fast enough to make sure we don’t get fired, but we can’t afford a hot standby system. SQL Server 2005’s new filegroup backup & restores give me a way around that.
First, long before disaster strikes, we have to break it up into a series of filegroups for easier management:
- 500gb filegroup with old sales data (more than 1 year old, and that data doesn’t change, so we’ve made it read-only)
- 400gb filegroup with old payroll data (also more than 1 year old)
- 100gb primary filegroup with current sales & payroll data (stuff in the last year, and all the current data goes in here)
When disaster strikes, here’s what SQL 2005’s new filegroup restore lets us do:
- Restore the primary filegroup with the current data in a matter of minutes, and put the database online. The users can query, but if they try to query more than a year ago, they’ll get an error message.
- Restore the old sales filegroup while the users are already querying the current data. We want this faster than payroll, because we only do payroll every two weeks, but the sales guys may want to query old sales data faster.
- Restore the old payroll filegroup, bringing the database fully online.
But keep in mind that we’re talking about complete disasters here, like when the server craters altogether and we have to start with a restore of our primary filegroup.
Filegroup Design Tips and Tricks
Since the primary filegroup must come online first, keep the primary filegroup relatively small. If the database is one terabyte, don’t create a 900 gigabyte filegroup as the primary file, because filegroup restores won’t be much faster than conventional restores.
Create a primary filegroup with the most urgently needed tables: configuration tables, user security, and whatever your application absolutely must have in order to run.
Then create a secondary filegroup with the most commonly queried tables: customers, items, warehouses, employees, whatever data is relatively small and helpful.
Finally, create additional secondary filegroups with large tables that are not queried as frequently, like archived data or reporting tables.
Moving files to different filesgroups in sql server 2005
In SQL Server 2005, a new clause ‘MOVE TO’ was added to the ALTER TABLE command to allow for moving a table to a different filegroup. MOVE TO clause is used along with DROP CONSTRAINT clause in the ALTER TABLE syntax. When a primary key constraint or a constraint which created a clustered index is dropped, leaf level data rows of the clustered index are placed in non clustered table. In SQL Server 2005, when clustered index is dropped (By dropping constraint that created clustered index), one can move table to new filegroup or partition scheme in same transaction by using this new ‘MOVE TO’ option. Let us see this with the help of an example.
Example:
ALTERTABLE oneC DROPCONSTRAINT PK_oneC WITH(MOVETO HISTORY)
GO
ALTERTABLE oneC ADDCONSTRAINT PK_oneC PRIMARYKEY(id)
Note: Here onec is the table and History is the filegroup
We need to keep in mind the following restrictions when the ‘MOVE TO’ option is used:
• MOVE TO is not valid for indexed views or non-clustered indexes.
• The partition scheme or filegroup must already exist.
• If MOVE TO is not specified, the table will be located in the same
partition scheme or filegroup as was defined for the clustered index.
Piecemeal restoration sql server 2005
With each release of the software Microsoft has made great strides in improving their primary database platform. SQL Server 2005 is a true enterprise class database platform capable of supporting scalability and availability needs of the largest and most complex databases. Not the least of database availability improvements introduced with SQL Server 2005 is the piecemeal database restore functionality I will discuss in this tip.
SQL Server backup types
Before I talk about backup and restore functionality improvements with SQL Server 2005 let me discuss the three main types of backups available with all versions of software:
- Full backup – makes a copy of all data pages. A data “page” is basically a storage unit.
- Differential backup – makes a copy of only those data pages that have changed since the last full backup.
- Transaction log backup – makes a copy of all transactions that occurred since the last full, differential or transaction log backup.
A typical backup strategy takes advantage of the combination of these three types of backups.
The full backups are the simplest; they give you a full snapshot of your data; your entire database can be restored using a single full backup file in a single step. Unfortunately full backups are not practical to for very large databases. If a full backup of your database takes 23 hours to complete you cannot afford to run such backups daily. Typically you’d try to create a full backup daily, weekly or monthly (depending on the size of your database) and supplement these with transaction log and differential backups.
Differential backups are convenient because they only make a copy of those data pages that have changed since the last full backup. Therefore they are considerably smaller than full backups. Note however, that a differential backup does not copy the pages changed since the last differential backup – as many people mistakenly think; rather it copies all pages changed since the last FULL backup. So if you take a full backup on Sunday and then run differential backups during the week then Monday’s differential backup will be smaller than or equal size of all other weekdays’ differential backups. To restore a database you will need both full and differential backup files.
Transaction log backups take significantly less time to create than full or differential backups; they are also considerably smaller. On the other hand, to restore a database using transaction log backups you must have a full backup, the last differential backup (if you use differential backups at all) and all transaction log backups taken since the last full or last differential backup.
Filegroup backups
In addition to backing up the entire database you can backup individual files and file groups. A file group is simply a collection of data files. Transaction log files do not belong to any file groups. Each database contains a primary file group. In addition you can create one or more secondary file groups. As the best practice it is recommended that you place all user-defined (non system) objects into secondary file groups. The primary file group should be reserved for only system objects.
With previous versions of the software you could use file group backups to effectively shorten your backup window. You could backup a subset of the database much quicker than it would take to backup the entire database. However, when recovering a database you’d normally have to restore all file groups prior to making the database available to your users. The exception to this rule is the partial restore scenario which allows you to restore the primary file group and one or more secondary file groups.
The partial restore functionality with SQL Server 2000 has two drawbacks, however:
- It must be performed from a full database backup (you can specify which file groups you wish to choose from the full backup). You cannot use file group backups for performing the partial restore.
- Once you recover the subset of the database you cannot subsequently restore the rest of the database. As the name indicates a partial restore is meant for restoring a part of the database.
SQL Server 2005 extends the partial restore functionality with Piecemeal Restore. Piecemeal restore allows you to utilize file group level backups to restore each file group. Furthermore, as long as your primary file group is available users can resume their activities while the other file groups are being restored. Piecemeal restore enables you to recover the entire database as your schedule allows.
When using partial or piecemeal restore it is necessary to restore the primary file group first because it contains system level meta data; with previous versions meta data was stored in system tables, with SQL Server 2005 system tables (like syscolumns, sysindexes, etc) are implemented as views. They are also deprecated – for querying system meta data we should use dynamic management views instead. The secondary file groups can be restored in any order.
Please note that the piecemeal restore functionality is only supported by the Enterprise Edition of SQL Server 2005. The following section provides an example of how piecemeal restore is actually implemented.
SQL Server piecemeal restore
For our example scenario let’s suppose we have a data warehouse that tracks sales data for transactions that occurred during each year from 1996 until now. Let’s further suppose that all historical sales except ones that occurred within the current year are stored in the “history” file group. Historical data is static – we never change any records in that file group. As good database administrators who adhere to the best practices religiously we have placed all user-defined objects in secondary file groups and not in the primary file group. Since data in the “history” file group never changes we only back it up once a month. The “current” file group stores the sales for the current year and gets backed up daily.
One fine morning we get a phone call from the operation’s department – database is down. Our server experienced some hardware issues and we need to recover the database to another server. The “current” and “primary” file groups are relatively small and could be restored within an hour. The “history” file group could take many hours to restore. Fortunately for us our users are mainly concerned about the recent transactions and only examine old data occasionally. So we decide to use the piecemeal restore functionality to make the database available to the users ASAP.
Tips on fileGroups
While filegroups provide an opportunity for fine-tuning performance by allowing you to move specific tables and indexes from one physical drive array to another, unless you have a lot of SQL Server experience and have a very large database, you may end up causing more performance problems than you fix.
As a general rule of thumb, don’t try to manually assign tables and other objects to filegroups to reside on specific physical drives. Instead, let RAID 5, RAID 10, or virtual SAN devices do this for you automatically. In the long run, you will probably get better overall performance.
*****
If you know for sure that a very large table in your database will always be accessed sequentially, consider putting this table in its own filegroup, and then locating it on a disk array by itself. Be sure you don’t put more than one table in the filegroup and ensure that no other files exist on that disk array. By doing this, you can take advantage of the disk array’s ability to operate sequentially to retrieve the data. Sequential access is always faster than random access.
The hard part is to identify if a table is accessed sequentially or not. Generally speaking, if the table is mostly read-only, and if it has a proper clustered index, and if the clustered index is used in the queries accessing the table, access can be considered to be sequential.
As you might guess, this a very arcane performance tip and one that you will rarely be able to take advantage of in the real world.
*****
If your database is very large and very busy, multiple files can be used to increase performance. Here is one example of how you might use multiple files. Let’s say you have a single table with 10 million rows that is heavily queried. If the table is in a single file, such as a single database file, then SQL Server would only use one thread to perform a read of the rows in the table. But if the table were divided into three physical files, then SQL Server would use three threads (one per physical file) to read the table, which potentially could be faster. In addition, if each file were on its own separate physical disk or disk array, the performance gain would even be greater.
Essentially, the more files that a large table is divided into, the greater the potential performance. Of course there is a point where the additional threads aren’t of much use when you max out the server’s I/O. But up until you do max out the I/O, additional threads (and files) should increase performance.
******
For VLDB, use filegroups to divide up the database into units that are relatively easy to backup and restore. It is virtually impossible to manage VLDB backups and restores in a timely manner without judicial use of filegroups.
*****
For VLDB, tables and their related indexes should be separated onto separate files and physical disks for optimum performance running queries, but not separate filegroups. If they are in separate filegroups, then you cannot back up and restore them as a single unit.
.
*****
If you use filegroups for your databases, and your application also uses either TEXT, NTEXT, or IMAGE datatypes, consider moving the tables that contain these data types into one or more files in their own filegroup, and then place this filegroup on its own disk array. TEXT, NTEXT, and IMAGE datatypes can create a large I/O burden on your server, and if you can move this data to its own array, you can help speed up access to it. Obviously, this will be a lot of work, but if your database is very large and your application is running slowly, you may want to consider this option. The more you can reduce disk contention, the faster your application can run.
*****
A simple way
to manage file groups
/* SQL Server 2005 Testing */
/* CREATE OUR TEST DATABASE */
—————————————————————————————————CREATEDATABASE [MultipleFilesDB] ONPRIMARY
(NAME= N’MultipleFilesDB_Primary’,FILENAME= N’e:\MultipleFilesDB_Primary.mdf’,SIZE= 4096KB ,FILEGROWTH= 10%),
FILEGROUP [SECONDARY]
(NAME= N’MultipleFilesDB_Secondary’,FILENAME= N’e:\MultipleFilesDB_Secondary.ndf’,SIZE= 4096KB ,FILEGROWTH= 10%)
LOGON
(NAME= N’MultipleFilesDB_log’,FILENAME= N’e:\MultipleFilesDB_log.ldf’,SIZE= 1024KB ,FILEGROWTH= 10%)
GO
—————————————————————————————————
—————————————————————————————————
/*** SET DB COMPATABILITY LEVEL TO SQL SERVER 2000 – 80 / 2005 -90 / 2008 – 100 ***/
EXEC dbo.SP_DBCMPTLEVEL @dbname=N’MultipleFilesDB’, @new_cmptlevel=90
GO
/*** KEEP RECOVERY MODEL AS FUUL ***/
ALTERDATABASE [MultipleFilesDB] SETRECOVERYFULL
GO
/*** SET PRIMARY FILEGROUP AS DEFAULT FILEGROUP ***/
USE [MultipleFilesDB]
GO
IFNOTEXISTS(SELECT groupname FROM dbo.sysfilegroups WHERE(status& 0x10)!= 0 AND
groupname = N’PRIMARY’)
ALTERDATABASE [MultipleFilesDB] MODIFYFILEGROUP [PRIMARY] DEFAULT
GO
—————————————————————————————————
—————————————————————————————————
/*** CREATE A TEST TABLE ON EACH FILEGROUP ***/
USE [MultipleFilesDB]
GO
SETANSI_NULLSON
GO
SETQUOTED_IDENTIFIERON
GO
CREATETABLE [dbo].[OrdersPrimary](
[OrderID] [INT] IDENTITY(1,1) NOT NULL,
[OrderDate] [DATETIME] NULL,
CONSTRAINT [PK_OrdersPrimary] PRIMARYKEYCLUSTERED
(
[OrderID] ASC
)ON [PRIMARY]
)ON [PRIMARY]
GO
CREATETABLE [dbo].[OrdersSecondary](
[OrderID] [INT] IDENTITY(1,1) NOT NULL,
[OrderDate] [DATETIME] NULL,
CONSTRAINT [PK_OrdersSecondary] PRIMARYKEYCLUSTERED
(
[OrderID] ASC
)ON [SECONDARY]
)ON [SECONDARY]
GO
/* Populate the first order in each table */
INSERTINTO dbo.OrdersPrimary (OrderDate)VALUES(GETDATE())
INSERTINTO dbo.OrdersSecondary (OrderDate)VALUES(GETDATE())
—————————————————————————————————
/*** BACKUP AND RESTORE THE FILES AND FILEGROUPS ***/
/* DO A FULL BACKUP */
BACKUPDATABASE [MultipleFilesDB] TODISK= N’e:\MultipleFilesDB.bak’
WITHNOFORMAT,INIT,NAME= N’MultipleFilesDB-Full Database Backup’,
SKIP,NOREWIND,NOUNLOAD,STATS= 10
GO
/* Add another set of rows so we can tell if the t-logs have been applied */
INSERTINTO dbo.OrdersPrimary (OrderDate)VALUES(GETDATE())
INSERTINTO dbo.OrdersSecondary (OrderDate)VALUES(GETDATE())
/* Do a transaction log backup */
BACKUPLOG [MultipleFilesDB] TODISK= N’e:\MultipleFilesDB.trn’
WITHNOFORMAT,INIT,NAME= N’MultipleFilesDB-Transaction Log Backup’,
SKIP,NOREWIND,NOUNLOAD,STATS= 10
GO
/*** NOW WE ARE ABLE TO RESTORE THE FILES AND FILEGROUPS FROM FULL BACKUP ***/
/* Restore just the primary filegroup */
USEmaster
GO
RESTOREDATABASE [MultipleFilesDB]
FILE= N’MultipleFilesDB_Primary’
FROMDISK= N’E:\MultipleFilesDB.bak’
WITHNORECOVERY
GO
/*** NOW REFRESH THE DATABASES IT IS BEEING SHOWN AS “MIDDLE OF RESTORE” :***/
/* Restore just the Secondary filegroup */
RESTOREDATABASE [MultipleFilesDB]
FILE= N’MultipleFilesDB_Secondary’
FROMDISK= N’E:\MultipleFilesDB.bak’
WITHNORECOVERY
GO
/*** NOW REFRESH THE DATABASES IT IS BEEING SHOWN AS “MIDDLE OF RESTORE” :***/
/* Restore just the LOG Backups */
RESTORELOG multiplefilesdb
FROMDISK= N’e:\MultipleFilesDB_Log1.trn’
WITHNORECOVERY
RESTORELOG multiplefilesdb
FROMDISK= N’e:\MultipleFilesDB_Log2.trn’
WITHNORECOVERY
RESTORELOG multiplefilesdb
FROMDISK= N’e:\MultipleFilesDB_Log3.trn’
WITHNORECOVERY
RESTORELOG multiplefilesdb
FROMDISK= N’e:\MultipleFilesDB_Log4.trn’
WITHRECOVERY
/* Refresh your list of databases in SSMS, and presto, the database is online ***/
SELECT*FROM MultipleFilesDB.dbo.OrdersPrimary
SELECT*FROM MultipleFilesDB.dbo.OrdersSecondary
Note:
Here we just get the full backup and then Tran log. From these we have restored all the file groups. In case our database size is in terabytes we can’t go for full database backup. We simply get the filsegroup backups and we can restore the individual filegroups. Remember that the backup process for filegroups is same as full backup even the extension is same as “.bak”.
If we observe that for the both primary and secondary file groups we are restored from same backup instead we just specify the file group backup names when we are restoring from filegroup backup files.
Hierarchy of Restoration
Full backup
File1 – filegroup1
File2 – filegroup1
File1 – filegroup2
File1 – filegroup3
———————-
Piecemeal restoration in sql server 2005
Only the difference is that we use PARTIAL keyword.
Creation of database and tables is same as above we will see how to backup filegroups and how to restore from it.
USEMASTER
BACKUPDATABASE [TeraWareHouse]
TODISK= N’D:\UdayA\MyFolder\FULLBACKUP.BAK’
WITHNOFORMAT,INIT,
NAME= N’terawarehouse-Full Database Backup’,SKIP,NOREWIND,NOUNLOAD,
STATS= 10
GO
/*** Observe the PRTIAL keyword ***/
USEMASTER
RESTOREDATABASE TeraWarehouse_restored
FILE= N’terawarehouse’
FROMDISK= N’D:\UdayA\MyFolder\FULLBACKUP.BAK’
WITHNORECOVERY,PARTIAL,
MOVE N’TeraWarehouse’TO
N’D:\UdayA\MyFolder\TeraWarehouse_restored.mdf’ ,
MOVE N’TeraWarehouse_log’TO
N’D:\UdayA\MyFolder\TeraWarehouse_log_restored.ldf’
/**** Now the database is avilable online and users can access the primary filegroup to querying ***/
/*** Restore the FileGroup “Current” ***/
RESTOREDATABASE TeraWarehouse_restored
FILE= N’currentdata’
FROMDISK= N’D:\UdayA\MyFolder\FULLBACKUP.BAK’
WITHNORECOVERY,
MOVE N’currentdata’TO
N’D:\UdayA\MyFolder\currentdata_restored.ndf’
/*** Restore the FileGroup “History” with recovery***/
RESTOREDATABASE TeraWarehouse_restored
FILE= N’olddata’
FROMDISK= N’D:\UdayA\MyFolder\FULLBACKUP.BAK’
WITHRECOVERY,
MOVE N’olddata’TO
N’D:\UdayA\MyFolder\olddata_restored.ndf’
/*** Now the database is totally online ***/