Sunday 6 April 2014

Saving index definitions and dropping indexes for large inserts

I was going to talk about how to capture index definitions so that you can drop indexes and then at a later point re-create them... but I got a bit carried away and wrote an SP that will do it all for you.  Before I share that though, a little something on why I did this.

Recently I've had a couple of large data migrations to do and always to tables that have a number of indexes.  There are two problems with this.

  1. Indexes add an overhead when you are inserting data and Microsoft advise to drop from one to all indexes depending on how much data you are inserting.  Dropping the indexes could add a good deal of speed to your insert.  Here is a guideline table of the indexes to drop per amount of data being inserted.  For example, you should drop the clustered index if the amount of data you are inserting exceeds 30% of the number of rows that already exist in your destination table.  Note how Microsoft advise to drop the indexes, not disable then rebuild.  The table is taken from Guidelines for Optimizing Bulk Import.
IndexesRelative amount of new data
Clustered index only30 percent
Clustered and one nonclustered index25 percent
Clustered and two nonclustered indexes25 percent
Single nonclustered index only100 percent
Two nonclustered indexes60 percent

  1.  Minimal logging can help when inserting large amounts of rows.  There are a few things you can do to avoid your transaction logs filling up, and minimal logging can help.  Plus it could also greatly improve the performance of your insert.  However, minimal logging has a few requirements.  And one of them is to not have a clustered index when inserting to a table that already has data. There are a number of additional requirements and these may vary according to your SQL version, however that's a discussion for another day. 
Suffice to say, with the two above points in mind, it would help to have something to drop your indexes, then rebuild them, without having to manually script out your indexes, then run them in again afterwards.  You'll want something you can incorporate into your migration scripts or package.

 To this end I've written an SP called dbo.DropCreateIndexes.  To use this, you should be aware of the following parameters

  • @Action:
    • Purpose - Define what you want the SP to do, e.g. drop or re-create the the indexes.
    • Options:
      • Drop - Drop the indexes on a table and save the definitions for later use.
      • DropOnly - Drop the indexes without saving the definitions.
      • Create - Recreate the indexes for a defined table, or for all saved indexes.  Definitions are deleted from the IndexDefinition table for indexes that are re-created.
      • SaveOnly - Saves the index definitions to the IndexDefinition table only.
      • Output (default) - Will output the definitions without doing any work.
  • @ShemaName:
    • Purpose - Define the schema to which the table belongs.  Defaults to dbo.
  • @TableName:
    • Purpose - Define the table to save the drop the indexes from, or re-create them to.
The SP will save any definitions to dbo.IndexDefinitions created on whatever database the SP has been created on. As SQL server doesn't seem store index definitions, this procedure will create the definitions from scratch according to what's in various system tables.  Simply use this to drop and save the indexes before your data load, then later re-create the indexes on the same table.

I have tried to cover all bases so this will also handle, multi-column indexes, included columns, filtered indexes, fill factors, table spaces, index padding, allowing row locks and ignoring duplicate keys.  If I've missed anything, be sure to let me know :)

NOTE: I have not fully tested this in all scenarios, nor have I accounted for any indexes other than clustered, non-clustered and primary keys. Please ensure you test this properly before using it.  It still needs some error handling as well.  I expect I'll be making the occasional tweak to it (such as maybe dropping clustered indexes only) as I continue to use it.  Suggestions are welcome.

A user defined function called dbo.concatStrings is also required, the script for which is below as well.  Make sure you install this along with the SP as it is required by the SP for building the create statements.

So here is the SP:

CREATE procedure [dbo].[DropCreateIndexes]
    @Action nvarchar(10) = 'Output',  --'Drop', 'DropOnly', 'Create', 'SaveOnly' or 'Output'
    @SchemaName nvarchar(128) = 'dbo',
    @TableName nvarchar(128)      --Table name or 'All' for create indexes only
   
AS

DECLARE
    @object_id bigint,
    @SQL nvarchar(max);
   
BEGIN
    set @object_id = OBJECT_ID(@TableName,'U');
   
    --CREATE SAVE TABLE:
    if OBJECT_ID('dbo.IndexDefinitions','U') is null
    BEGIN
        create table dbo.IndexDefinitions(
            ID int identity(1,1) not null,
            IndexName nvarchar(128) not null,
            TableName nvarchar(128) not null,
            SchemaName nvarchar(128) not null,
            SaveDate datetime default getdate(),
            CreateSQL nvarchar(max),
        );
    END;

   
    --SAVE INDEX DEFINITIONS FOR TABLE:
    if @Action in ('Drop','SaveOnly','Output')
    BEGIN
        --Delete any rows that exist for the same table if they exist
        delete from dbo.IndexDefinitions
        where TableName = @TableName
            and SchemaName = @SchemaName;
       
        --Create and save the definitions to dbo.indexDefinitions
        with ConcatCols as
        (
            select i.object_id,i.index_id,i.name as IndexName,stuff(
                (select ',[' + name + '] ' + case ic.is_descending_key
                        when 1 then 'DESC'
                        else 'ASC'
                    end
                from sys.columns c
                join sys.index_columns ic on ic.object_id = c.object_id
                    and ic.column_id = c.column_id
                where c.object_id = i.object_id
                    and ic.index_id = i.index_id
                    and ic.is_included_column <> 1
                for xml path(''))
            ,1,1,'')
            as ColumnParts
            from sys.indexes i
            where i.object_id = @object_id
        ),
        includedCols as (
            select i.object_id,i.index_id,i.name as IndexName,stuff(
                (select ',[' + name + ']'
                from sys.columns c
                join sys.index_columns ic on ic.object_id = c.object_id
                    and ic.column_id = c.column_id
                where c.object_id = i.object_id
                    and ic.index_id = i.index_id
                    and ic.is_included_column = 1
                for xml path(''))
            ,1,1,'')
            as IncludedColumns
            from sys.indexes i
        )
        select i.name as IndexName,o.name as TableName,schema_name(o.schema_id) as SchemaName,
            --PRIMARY KEY PART
            case i.is_primary_key
                when 1 then  'ALTER TABLE [' + SCHEMA_NAME(o.schema_id) + '].[' + o.name + '] ADD PRIMARY KEY CLUSTERED ('
            --INDEX PART
                else 'CREATE ' + case is_unique when 1 then 'UNIQUE ' else ' ' end + i.type_desc collate database_default + ' INDEX [' + i.name + '] on [' + SCHEMA_NAME(o.schema_id) + '].[' + o.name + '] ('
            end + cc.ColumnParts + ') ' +
            --INCLUDED COLUMNS
            case
                when inc.IncludedColumns is null then ''
                else 'INCLUDE (' + inc.IncludedColumns + ')'
            end +
            --FILTER DEFINITIONS
            case i.has_filter
                when 0 then ''
                else ' WHERE' + i.filter_definition
            end +   
            --COMPLETE WITH OPTIONS
            ' WITH (' +
                case i.is_primary_key
                    when 0 then
                        'SORT_IN_TEMPDB = OFF,' +
                        'ONLINE = ON,' +
                        'DROP_EXISTING = OFF,'
                    else ''
                end +
                'PAD_INDEX = ' + case i.is_padded when 1 then 'ON' else 'OFF' end + ',' +
                'STATISTICS_NORECOMPUTE = ' + case i.is_padded when 1 then 'ON' else 'OFF' end + ',' +       
                'IGNORE_DUP_KEY = ' + case i.ignore_dup_key when 1 then 'ON' else 'OFF' end + ',' +       
                'ALLOW_ROW_LOCKS = ' + case i.allow_row_locks when 1 then 'ON' else 'OFF' end + ',' +
                case i.fill_factor when 0 then '' else 'FILLFACTOR = ' + cast(i.fill_factor as varchar(3)) + ',' end +
                'STATISTICS_NORECOMPUTE = OFF)' +
            ' ON [' + fg.name + ']' as CreateSQL
        into #tempDef
        from sys.indexes i
        join sys.objects o on o.object_id = i.object_id
        join ConcatCols cc on cc.object_id = o.object_id
            and cc.index_id = i.index_id
        join sys.filegroups fg on fg.data_space_id = i.data_space_id
        left join includedCols inc on inc.object_id = i.object_id
            and inc.index_id = i.index_id;
           
           
        if @Action in ('Drop','SaveOnly')
        BEGIN
            insert into dbo.IndexDefinitions(IndexName,TableName,SchemaName,CreateSQL)
            select * from #tempDef;
        END;
       
        if @Action in ('Output')
        BEGIN
            Select * from #tempDef;
        END;
       
    END;
       
   
    --DROP ALL INDEX ON TABLE:
    if @Action in ('Drop', 'DropOnly')
    BEGIN
        select @SQL = (
            select case
                when i.is_primary_key = 1 then   
                    'ALTER TABLE [' + SCHEMA_NAME(o.schema_id) + '].[' + o.name +'] DROP CONSTRAINT ' + i.name + ';'
                else
                    'DROP INDEX [' + i.name + '] ON [' + SCHEMA_NAME(o.schema_id) + '].[' + o.name +'];'
            end
            from sys.indexes i
            join sys.objects o on o.object_id = i.object_id
            where o.object_id = @object_id
            for xml path ('')
        );       

        EXEC sp_executesql @SQL;
               
    END;
   
    if @Action = 'Create'
    BEGIN
        select @SQL = dbo.concatStrings(@TableName,@SchemaName);

        EXEC sp_executesql @SQL;   
       
        delete from dbo.IndexDefinitions
        where TableName = case @TableName when 'All' then TableName else @TableName end
            and SchemaName = case @TableName when 'All' then SchemaName else @SchemaName end;
       
    END;       
END;

And now for the function:

CREATE function [dbo].[concatStrings](@TableName nvarchar(128), @SchemaName nvarchar(128) = 'dbo')
returns nvarchar(max)

as

BEGIN
    declare @x varchar(max)
   
    select @x = coalesce(@x+';','') + CreateSQL
            from IndexDefinitions
            where TableName = case @TableName when 'All' then TableName else @TableName end
                and SchemaName = case @TableName when 'All' then SchemaName else @SchemaName end
           
    return @x;
END;

There you go.  Enjoy.  Any feedback is welcome.

*EDIT*
Here's an example:
exec dbo.DropCreateIndexes
    @action = N'Create',
    @TableName = N'dropIndexTest'

No comments:

Post a Comment

Deploying out of hours shouldn't be a thing

Getting code into production should not be hard. Nor should it have to be scheduled for special times. I have recently been involved with th...