http://my.advisor.com/doc/16671
Q: We've been having problems with disk space on our SQL Server. I've been deleting old, unimportant data, but that isn't enough. I set up additional hard drive space on a separate drive, and I know how to create a new filegroup for the database to use. Is there a way to move existing tables there?
-- Mark McGibney, London, England
A: There are two issues here: One is the disk space issue, and the other relates to using filegroups. The two aren't the same thing. If all you want to do is increase available disk space for your database, you can do that without creating new filegroups. The real purpose of filegroups is to create logical relationships between multiple database files (likely on separate hard drives). You can administer these relationships together. All databases have at least one filegroup -- the primary filegroup -- which is also the default.
So to deal with your primary problem of getting more disk space for your database, you can create a new secondary data file on your hard drive (it'll have the extension .NDF, as opposed to the primary data file with the .MDF extension). When you create the new file, it automatically goes into the primary filegroup and is available for all tables to use.
Deliberately distributing tables across multiple drives is a pretty straightforward performance trick. Of course it only makes sense if you're using the table in such a way that it's likely to need multiple drives; you need to access different areas of the table simultaneously for there to be any benefit. Also you can get about the same benefit using a RAID array (whether 0, 1, or 5) without logically complicating your database. I'm prone to the RAID 5 solution myself; it's cost-effective, simple to operate, and provides extra data protection for the database, because you can have one hard drive fail and the system continues to function normally (albeit with a lot of beeping about the failed drive).
But suppose you don't want all tables distributed to all the database files? Perhaps one particular table (or tables) is the primary bottleneck of your application, and you want to focus on a solution for just that one part of the database. That's where filegroups come in. You can create a new filegroup, assign database files to it, then assign tables to it, and only the tables in that filegroup have access to your distributed database file structure.
It's easy to create a table in a specific filegroup. At the end of the CREATE TABLE statement, add an ON clause, specifying the filegroup, like this:
CREATE TABLE foo (mycol1 int) ON myFileGroup
But what if you have an existing table? The easiest trick I know to move a table from one filegroup to another is to create (or recreate) a clustered index on the table with the ON clause. A clustered index is the physical order of the table itself, unlike unclustered indexes, which are like their own little tables that contain only the columns you specify for the index plus a RID (row identifier). So when you create a clustered index on a new filegroup, it forces SQL Server to move the entire table to the new filegroup.
This is no more arduous than creating a clustered index on the default filegroup, but that doesn't trivialize it. When you create a clustered index, you're essentially rewriting the entire table in a new order. If there are many millions of rows, expect it to take awhile. And it's in your best interest to do a backup of the database before doing such serious operations.
As far as unclustered indexes go, you can improve performance by putting unclustered indexes in a separate filegroup from the actual table. That presumes the two filegroups don't share any hard drives (and preferably don't share any controllers either) and that your queries involve both the table and the index at the same time. In that case, because SQL Server can search both the index and table simultaneously using separate drives, it runs faster.
In SQL Server 2005 we get much smarter filegroups with the advent of partitioning schemes. In partitioning schemes, you can specify what parts of a table go into what filegroups. This lends a lot more control to using multiple drives across multiple filegroups, as well as being able to restrict what tables use multiple filegroups.
I've had situations where using multiple filegroups has worked out great, but you have to do your research first. Use SQL Profiler to study your database's behavior and lots of query analysis to show what the queries do. When you find a situation where the velocity and frequency is high and the query plan makes sense, you benefit greatly from using multiple filegroups.
But it can go the other way too. Using lots of separate database files and filegroups can create so many parallel disk threads that the cost of handling the threads outweighs the benefits of distributing data. Just be careful out there -- take your time and test thoroughly before going filegroup crazy. This is another reason why investing in more advanced hardware -- such as high-performance RAID controllers -- makes a lot of sense: there's very little downside to them and you still have the logical organizational options as well.
No comments:
Post a Comment