Saturday, 20 December 2014

IN / NOT IN vs EXISTS / NOT EXISTS vs INTERSECT / EXCEPT vs JOIN (Part 2)

EXISTS and NOT EXISTS:

We've already discussed IN and NOT IN in part 1.  We'll use the same examples tables from there, so please run those statement to create the temporary tables that we'll be using.  Everything below was done in SQL Server 2012.

EXISTS:

Exists is slightly different to IN.  It returns a true predicate as long as one row is returned from within the subquery, whereas an IN statement will match all values from the outer query to the values from the inner subquery.  So you should be able to see right away the benefit of using EXISTS over an IN statement.  All that SQL Server needs to do is find one row, then it can stop searching for further matches within the inner subquery.  This is commonly referred to as short circuiting and can make for a much faster query.

EXISTS also allows you to match against more than one column, whereas you would have to use multiple IN statement to achieve the same thing with IN, which would be very costly.  Have a look at the below example.  Try adding a second predicate in the subquery.

select *
from #products
where exists
(
select *
from #orderItems
where productName = #products.productName
);

Quite often I have seen developers avoid using a * in the inner subquery thinking this will produce less overhead.  For example putting in a "SELECT 1" instead.  There is no need to do this.  Remember the EXISTS is simply checking to see if a row from the subquery is returned.  It doesn't care about the columns, so doesn't include them anyway.  Try running one against the other and checking out the execution plans.  You'll see they will be exactly the same.

Of interest: You'll also notice in the execution plan, that it's done a TOP (1) on the subquery.

NOT EXISTS:

NOT EXISTS is similar to EXISTS, of course, but will only evaluate as true if NO results are returned by the subquery.  Try the following:

select *
from #products
where not exists
(
select *
from #orderItems
where productName = #products.productName
);


Now we don't have the same problem that we had with NOT IN when it returned nulls.  But what if we wanted to compare two similar tables to see if any rows differ.  For example, you are bringing in data from one system to another, but don't want to insert or update any rows if an exact matching row already exists in the destination table.

Let's make another product table and alter some of the data:

insert into #products (productName,price) values (null,0);
select * into #products2 from #products;
update #products2 set productName = null where productID = 6;

Now lets try and highlight any rows that differ between the two product tables when comparing against ProductID and ProductName:

select productID,productName from #products p1
where not exists
(
select *
from #products2
where productID = p1.productID
and productName = p1.productName
);

This returns the following:
Not Exists Comparison Results







Row 1:  Great.  It recognised that product 6 was different.  #Product2 had the product name for this item as null.  #Product did not.

Row 2:  Not what we wanted.  Product 7 is exactly the same in both tables, so why has it shown up here.  Remember what we discussed in the previous post about nulls and that they essentially mean "unknown."  When comparing two unknowns, the end result will also be unknown.  So it cannot match the two.

You could of course use the ISNULL function for both ProductName columns, but this adds overhead, especially if you have many columns to compare.

This leads us on nicely to the next chapter where I will go over EXCEPT and INTERSECT.

Till then, MERRY CHRISTMAS!

Saturday, 2 August 2014

IN / NOT IN vs EXISTS / NOT EXISTS vs INTERSECT / EXCEPT vs JOIN (Part 1)

There are a number of ways of filtering your data according what exists or doesn't exist in another table or result set.  These are:

  1. IN and NOT IN
  2. EXISTS and NOT EXISTS
  3. INTERSECT and EXCEPT
  4. LEFT JOIN

In this series I will address the pros and cons of each, and how they handle nulls. I won't be discussing how you use them, but rather what they do and when to use them.

Let's start by prepping the tables that we'll use in the examples:

create table #orderItems ( orderItemID integer identity (1,1), productID integer, productName varchar(100), itemTotal decimal(6,2));
create table #products ( productID integer identity (1,1), productName varchar(100), price decimal(6,2));
insert into #products (productName,price) values ('Big Widget',39.99), ('Small Widget',12.95), ('Widget Multipack',72.95), ('Black Widget',14.99), ('Big Widget',39.99),  --Duplicate ('Super Widget',99.99);
insert into #orderItems (productID,productName,itemTotal) values (1,'Big Widget',35), (2,'Small Widget',12.95), (1,'Big Widget',39.99), (2,'Small Widget',12.95), (1,null,35), (5,'Super Widget',99.99);

The examples below will all be attempting to find the products that have had an order, or products that have not.  Note we have one duplicate product and we have one orderItem that has a null productName.  It's not very normalised, but for the purpose of this exercise we'll be matching on productName.

IN and NOT IN

These essentially just create a long list of AND or OR clauses and will of course return all rows, regardless of duplicates, from the containing query that match (or do not match) the resulting values from the subquery.  Lets look at IN first:

The following two queries are synonymous.  The second query is just to illustrate the logic of how the IN operator works.

select * 
from #products
where productName in (select productName from #orderItems);

select * 
from #products
where productName = 'Big Widget'
or productName = 'Small Widget'
or productName = 'Big Widget'
or productName = 'Small Widget'
or productName = null
or productName = 'Super Widget';  

The IN operator essentially creates a bunch of OR EQUAL TO conditions and a result is returned.  Now let's try NOT IN.

Again, the following two queries are synonymous:

select * 
from #products
where productName not in (select productName from #orderItems)

select * 
from #products
where productName <> 'Big Widget'
and productName <> 'Small Widget'
and productName <> 'Big Widget'
and productName <> 'Small Widget'
and productName <> null
and productName <> 'Super Widget';

Now for a NOT IN, it creates AND NOT EQUAL TO conditions.  Because these are now AND conditions, all of the results have to equate to true.  What happens when you run either of these queries?  What did you expect? Instead of seeing the two products that have no orders as you might expect, there are no results at all.  Can you see why? Remember that a null is an unknown value.  Nothing can equal, or not equal, an unknown.  So "productName <> null" does not return as true and because all results in the subquery set must be true, no result is returned.

Summary:

  • Be wary of nulls when doing a NOT IN.  Know your data if you plan on using NOT IN.
  • All rows will be returned. If you wanted distinct product names you would would have to use something like a distinct statement.
  • You can only match on one column.


Next:  EXISTS and NOT EXISTS

Tuesday, 17 June 2014

Reseeding Identities

I did a job today where I needed to to copy data from one table to another.  Both tables were exactly the same, with identity columns and the destination table empty.  In auto-pilot mode, I was about to write a script to copy the data across, capture the new identity numbers to have them mapped to the old one (much like in my previous post Capturing identity ID's from an insert or update).  But then of course I quickly realised that there was a much simpler way of doing this.  It didn't matter if the identity numbers were contiguous. I could just switch identity insert on, and retain all the values as they were, which also made it easier when I copied across data from other tables that linked on foreign keys.

A couple of things to take note of though, when doing this:

  1. You can't just insert a value into an identity column.
  2. When you DO insert values into an identity column, the values will be higher than the next number to be generated, which will fail due to a unique constraint.
So, you first need to enable identity insert.  This will, wait for it... allow you to insert values into an identity column. Duh, right?  Just make sure you switch it off again when you're done.  This is the statement that you need to run:

SET IDENTITY_INSERT {table_name} ON | OFF;

Another thing to be aware of in order to insert... you need to specify all of the column names in the insert statement.  There's no using select *, or relying on the order of columns in the destination  table.

So you've set identity_insert to on and inserted the data, probably using insert into select.  Remember that the value of the next identity number to be generated won't be affected by the insert and so will need to be changed.  This is done using the DBCC CHECKIDENT command, which can also be used to see what the current identity value is.  To reseed though, you will run the following:

DBCC CHECKIDENT ({table_name}, RESEED, n);

Where n is value of the next number you want to have generated minus 1.  So if the highest value of the records you've inserted is 666, then you will reseed the column to be 666, and the next number it generates will be 667.

---------------------
--Reseed Example--
---------------------

declare @maxID int;
set identity_insert destinationTable on; 
insert into destinationTable (ID, col1, col2, col3)
select (ID, col1, col2, col3)
from sourceTable;
 
set identity_insert destinationTable off; 
select @maxID = max(ID) from destinationTable; 
DBCC CHECKIDENT (destinationTable, reseed, @maxID);






Wednesday, 21 May 2014

Recursive CTEs

One feature that I've found more and more useful (2005+) is the Common Table Expression (CTE).  A CTE essentially allows you to create a temporary table, much like a derived table, but can be reference multiple times and is easier to read and script.  But it has the added bonus of being able to reference itself in a recursive way.

As an example scenario, let's say we have a table that hold folders or areas that other records can be linked to, a document record, for example.  Now we want to be able to have as many folders and sub-folders as we like... so we contain these in a table like so:


Use the below to create this table as a temporary table.

create table #folders (id int identity(1,1),FolderName varchar(20),parentID int);

insert into #folders(FolderName,parentID) values
('Top1',0),
('Top2',0),
('Mid1',1),
('Bot1',3),
('Mid2a',2),
('Mid2b',2),
('NextMid2a',5),
('Bot2a',7);
Now lets say we have a document assigned to folder ID 8 (Bot2a).  We want to be able to show the full path that the folder is contained in, but that information isn't found in the #folders table.  We are able to trace it back though.


Looking at my extremely straight arrows, we can see that each folder is assigned a parent folder, with the top level folders having a parentID of 0.  So we can see that the full folder path for a document in Bot2a will be "Top2\Mid2a\NextMid2a\Bot2a."  So how do we get the path without writing a convoluted while loop?  Enter the recursive CTE.

Just like any normal CTE, it will have a query contained in the WITH section, however this time the must consist of two unioned select statements.  The first one will be the Anchor statement, containing only the top level records, and the second statement will be the recursive one, excluding the root folders.  Here:

with CTE as
(
    select ID,folderName,parentID,cast(folderName as varchar(250)) as path,1 as PathLevel
    from #folders
    where parentID = 0


    union all

    select f.id,f.foldername,f.parentID,cast(cte.path + '\' + f.folderName as varchar(250)),cte.Pathlevel + 1
    from #folders f
    join cte on cte.ID = f.parentID
    where f.ID <> 0

)
select * from CTE;
Running this will give you the following results:


So you can see each folder is presented with its full path, up to and including itself.  Let's process what we just did.  In the statement highlighted in orange, we simply queried to return only the top level folders.  We also created a Path and PathLevel column.  For the anchor records, the full path will be equal to just the folder name, and we'll assign PathLevel a value of one to represent at what level in a path the current folder sits in.

In the recursive statement highlighted in blue, you can see we joined the CTE itself!  This is where we link the current record to its parent in the anchor statement.  We can then refer to the previous iteration and build on the Path string value, or add a 1 for the level.

Pretty easy.  Pretty quick.

Bear in mind that this can be an expensive query with large set of data with large numbers of recursions.  Also, if a mistake is made with your joins you could potentially end up in an in infinite loop.  Or at least you would if SQL Server didn't enforce a default limit of 100 recursions.  This essentially means that (in this example) you can't get more than 100 folders in your path. If you exceed this you will be presented with the following error:

Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

If you want to see this for yourself, simply change the join to "join cte on cte.ID = cte.ID"  to create an infinite loop.  You can change the default limit though by simply adding the following option at the end of your full statement,"OPTION (MAXRECURSION n)" where of course n is whatever value you choose.  Try changing the MAXRECURSION limit to 200 and running the CTE with the infinite loop again.

What other uses for a recursive CTE can you think of?





Wednesday, 7 May 2014

Understanding locks part 2 - Types of Locks

In my last post we saw how to find and determine locks on your resources.  But what are the different types of locks and what do they do?

The two main ones would be exclusive (X) and shared (S).  An exclusive locks will lock that resource to prevent any other transaction from either writing to or reading from that resource, while a shared lock will allow others to read, but not write or update.

So multiple read locks can be made on the same resource.  I guess that's why they called it shared.  Here's a simple table showing which simultaneous locks are allowed.

S
X
S
Yes
No
X
No
No

So if you have a exclusive lock, you have to wait for that transaction to complete, before your shared lock can take place.  Or vice versa.  But if you have a shared lock first, and the second one is a shared lock as well, no blocking occurs.
Another type of lock are the intent locks.  The main flavours will be intent shared (IS) and intent exclusive (IX).  
Think of these as a kind of marker, placed at the top of a hierarchical structure.  If a lock is required at row level then a shared lock, for example, could be placed on the row.  This would mean that if an exclusive lock is required on the same row, a lot of resource would be wasted in finding that row, only to find out it's already locked.  What you could do is place a marker at the table level with details about the row lock.  This saves the next transaction needlessly traversing the hierarchy to determine if a lock at row level is already in place.
Let's see it in action.  In AdventureWorks (2008 or 2012), begin a transaction with an update:
begin transaction updateit1;
update person.Person set FirstName = 'Kenny' where BusinessEntityID = 1;
In a separate query window / connection, run the query to find locks posted as in my previous post, Understanding locks part 1 - Finding locks and blocks.  You would expect to see row or key (for indexed columns) level blocks for this as we're updating only one row.  In this case you should see an IX lock on the table, and again on at least one page, finally followed by an X lock on the resource.  As you can see the details on the final actual exclusive lock trickle down through the hierarchy of the index pages. 
Now in yet another query window, run the following select statement:
begin transaction selectit1;
select * from person.Person where BusinessEntityID = 1;
If you check the locks again, you will now see something similar for S and IS locks. Same principle applies.  Now try running the same update statement in yet another window:
begin transaction updateit2;
update person.Person set FirstName = 'Kenny' where BusinessEntityID = 1;
Check the locks again and you will see anew type of lock. U and IU for update and intent update.  An update locks is simply a type of exclusive lock that will become a proper exclusive lock once it has finished being blocked.  If you now commit or rollback updateit1, the exclusive locks for that transaction will vanish, as will the shared locks, and the update locks from updateit2 will change to exclusive locks.  Commit or rollback updateit2 to clear down all locks.
Further details can be gleaned from http://technet.microsoft.com/en-us/library/aa213039(v=sql.80).aspx

Monday, 14 April 2014

Understanding locks part 1 - Finding locks and blocks

One thing I've had trouble with has been locks. What are the different types of locks and why are they important.  Locking can occur on various resources, such as a row, page, table or database.

How these locks work vary according to the lock type.  For example, you may get a lock on a row that prevents any updates to that row, but allows reading.  Or it may even prevent reading as well. SQL Server creates locks on resources to maintain compliance with the ACID principles.  In a nutshell, a lock will prevent a second user from either modifying a record at the same time as someone else or prevent reading from a resource before it has finished updating. 

A block occurs when a lock prevents another transaction from completing when that other transaction attempts to handle the same resource.  Locking and blocking are part of normal every day activity for SQL server, but sometimes they can cause application issues due to things like badly written queries or running large reports in a live environment.  So it helps to know how to locate locks to zero in on the source of the issue.

Finding locks:

First, let's create a lock or two. I'm using the AdventureWorks2008R2 database for this example.

begin transaction updateit;
update person.Person set FirstName = 'Kenny' where BusinessEntityID = 1;
Now that we have an un-commited transaction with an update, this should have created a lock, probably a row and / or key lock (more on the lock types in a later post).  Now how do we locate the locks and what types they are?  This is where the sys.dm_tran_locks DMV comes in (2005+).  Try this:
select * from sys.dm_tran_locks;
This gives some useful information but it would make a lot more sense if we joined to some system tables.  Now try this:
select db.name as DBName,l.resource_type,o.name as ObectName,au.type_desc,i.name as IndexName,p.rows,
    l.resource_description,l.request_mode
from sys.dm_tran_locks l  --Locking DMV
join sys.databases db on db.database_id = l.resource_database_id
left join sys.allocation_units au on au.allocation_unit_id =      l.resource_associated_entity_id
left join sys.partitions p on p.hobt_id =   l.resource_associated_entity_id
left join sys.indexes i on i.index_id = p.index_id
    and i.object_id = p.object_id
left join sys.objects o on o.object_id =
    case
        when p.partition_id is null then l.resource_associated_entity_id
        else p.object_id
    end
where db.database_id = DB_ID()  --Limit results to this database
    and l.resource_associated_entity_id <> 0 
    and o.name not like 'sys%'  --Ignore system base tables
order by o.name,i.name,l.resource_type,l.request_type
;
 You will need to run this in the relevant database, i.e. AdventureWorks2008R2 for this example.  This should give you something that looks like:









Now we know what resources are locked.  We can see it's against the Person table and they are row level and key locks.  We also know the lock types, which are exclusive or intent exclusive locks.


Finding Blocks:

Now let's create a block.  Open up a separate tab / connection in SSMS to AdventureWorks2008R2 while the first transaction still remains uncommited, and run the following:

select * from person.Person where BusinessEntityID = 1;
Now because the first transaction hasn't been committed or rolled back yet, this statement won't complete and is now blocked.  If you run the above query again you should notice a few more rows, in particular intent shared locks.  But we're interested in blocks only at this point.

You could run "execute sp_who2; " and check the BlkBy column for Spids that are blocking others, and then run "DBCC INPUTBUFFER(spid); " with the source spid (there may be several nested blocks) to get the locking statement that's creating the block.

Or you could again use a DMV again to make things easier to spot.  Try this:

SELECT r.session_id, db.name as DBName, r.start_time, r.status, r.command, r.blocking_session_id, wait_type, wait_time, wait_resource, cpu_time, st.text as SQLText
FROM sys.dm_exec_requests r
join sys.databases db on db.database_id = r.database_id
cross apply sys.dm_exec_sql_text(r.sql_handle) st
where r.blocking_session_id > 0
order by start_time
;
This should give you something like this:




Unlike sp_who2, this will only return blocked transactions, and will also give you the statement that is blocked and the time that it started.  Now you can use DBCC INPUTBUFFER as above.

Now commit or rollback the original update statement.  e.g. "rollback transaction updateit; "  You should now notice three things.
  1) The select statement has now returned its results.
  2) If you run the query to find locks, you should now longer see the locks
  3) If you run the query to find blocks, these will be gone too.

If push came to shove, and if there was no alternative, you could kill the spid of the transaction causing the blocks. In the above example, instead of committing or rolling back, you could run "kill 59 "  Note that in our example screenshot, 59 is the blocking spid.

In the next post I intend to talk about the different lock types.


Sunday, 6 April 2014

Saving index definitions and dropping indexes for large inserts

I was going to talk about how to capture index definitions so that you can drop indexes and then at a later point re-create them... but I got a bit carried away and wrote an SP that will do it all for you.  Before I share that though, a little something on why I did this.

Recently I've had a couple of large data migrations to do and always to tables that have a number of indexes.  There are two problems with this.

  1. Indexes add an overhead when you are inserting data and Microsoft advise to drop from one to all indexes depending on how much data you are inserting.  Dropping the indexes could add a good deal of speed to your insert.  Here is a guideline table of the indexes to drop per amount of data being inserted.  For example, you should drop the clustered index if the amount of data you are inserting exceeds 30% of the number of rows that already exist in your destination table.  Note how Microsoft advise to drop the indexes, not disable then rebuild.  The table is taken from Guidelines for Optimizing Bulk Import.
IndexesRelative amount of new data
Clustered index only30 percent
Clustered and one nonclustered index25 percent
Clustered and two nonclustered indexes25 percent
Single nonclustered index only100 percent
Two nonclustered indexes60 percent

  1.  Minimal logging can help when inserting large amounts of rows.  There are a few things you can do to avoid your transaction logs filling up, and minimal logging can help.  Plus it could also greatly improve the performance of your insert.  However, minimal logging has a few requirements.  And one of them is to not have a clustered index when inserting to a table that already has data. There are a number of additional requirements and these may vary according to your SQL version, however that's a discussion for another day. 
Suffice to say, with the two above points in mind, it would help to have something to drop your indexes, then rebuild them, without having to manually script out your indexes, then run them in again afterwards.  You'll want something you can incorporate into your migration scripts or package.

 To this end I've written an SP called dbo.DropCreateIndexes.  To use this, you should be aware of the following parameters

  • @Action:
    • Purpose - Define what you want the SP to do, e.g. drop or re-create the the indexes.
    • Options:
      • Drop - Drop the indexes on a table and save the definitions for later use.
      • DropOnly - Drop the indexes without saving the definitions.
      • Create - Recreate the indexes for a defined table, or for all saved indexes.  Definitions are deleted from the IndexDefinition table for indexes that are re-created.
      • SaveOnly - Saves the index definitions to the IndexDefinition table only.
      • Output (default) - Will output the definitions without doing any work.
  • @ShemaName:
    • Purpose - Define the schema to which the table belongs.  Defaults to dbo.
  • @TableName:
    • Purpose - Define the table to save the drop the indexes from, or re-create them to.
The SP will save any definitions to dbo.IndexDefinitions created on whatever database the SP has been created on. As SQL server doesn't seem store index definitions, this procedure will create the definitions from scratch according to what's in various system tables.  Simply use this to drop and save the indexes before your data load, then later re-create the indexes on the same table.

I have tried to cover all bases so this will also handle, multi-column indexes, included columns, filtered indexes, fill factors, table spaces, index padding, allowing row locks and ignoring duplicate keys.  If I've missed anything, be sure to let me know :)

NOTE: I have not fully tested this in all scenarios, nor have I accounted for any indexes other than clustered, non-clustered and primary keys. Please ensure you test this properly before using it.  It still needs some error handling as well.  I expect I'll be making the occasional tweak to it (such as maybe dropping clustered indexes only) as I continue to use it.  Suggestions are welcome.

A user defined function called dbo.concatStrings is also required, the script for which is below as well.  Make sure you install this along with the SP as it is required by the SP for building the create statements.

So here is the SP:

CREATE procedure [dbo].[DropCreateIndexes]
    @Action nvarchar(10) = 'Output',  --'Drop', 'DropOnly', 'Create', 'SaveOnly' or 'Output'
    @SchemaName nvarchar(128) = 'dbo',
    @TableName nvarchar(128)      --Table name or 'All' for create indexes only
   
AS

DECLARE
    @object_id bigint,
    @SQL nvarchar(max);
   
BEGIN
    set @object_id = OBJECT_ID(@TableName,'U');
   
    --CREATE SAVE TABLE:
    if OBJECT_ID('dbo.IndexDefinitions','U') is null
    BEGIN
        create table dbo.IndexDefinitions(
            ID int identity(1,1) not null,
            IndexName nvarchar(128) not null,
            TableName nvarchar(128) not null,
            SchemaName nvarchar(128) not null,
            SaveDate datetime default getdate(),
            CreateSQL nvarchar(max),
        );
    END;

   
    --SAVE INDEX DEFINITIONS FOR TABLE:
    if @Action in ('Drop','SaveOnly','Output')
    BEGIN
        --Delete any rows that exist for the same table if they exist
        delete from dbo.IndexDefinitions
        where TableName = @TableName
            and SchemaName = @SchemaName;
       
        --Create and save the definitions to dbo.indexDefinitions
        with ConcatCols as
        (
            select i.object_id,i.index_id,i.name as IndexName,stuff(
                (select ',[' + name + '] ' + case ic.is_descending_key
                        when 1 then 'DESC'
                        else 'ASC'
                    end
                from sys.columns c
                join sys.index_columns ic on ic.object_id = c.object_id
                    and ic.column_id = c.column_id
                where c.object_id = i.object_id
                    and ic.index_id = i.index_id
                    and ic.is_included_column <> 1
                for xml path(''))
            ,1,1,'')
            as ColumnParts
            from sys.indexes i
            where i.object_id = @object_id
        ),
        includedCols as (
            select i.object_id,i.index_id,i.name as IndexName,stuff(
                (select ',[' + name + ']'
                from sys.columns c
                join sys.index_columns ic on ic.object_id = c.object_id
                    and ic.column_id = c.column_id
                where c.object_id = i.object_id
                    and ic.index_id = i.index_id
                    and ic.is_included_column = 1
                for xml path(''))
            ,1,1,'')
            as IncludedColumns
            from sys.indexes i
        )
        select i.name as IndexName,o.name as TableName,schema_name(o.schema_id) as SchemaName,
            --PRIMARY KEY PART
            case i.is_primary_key
                when 1 then  'ALTER TABLE [' + SCHEMA_NAME(o.schema_id) + '].[' + o.name + '] ADD PRIMARY KEY CLUSTERED ('
            --INDEX PART
                else 'CREATE ' + case is_unique when 1 then 'UNIQUE ' else ' ' end + i.type_desc collate database_default + ' INDEX [' + i.name + '] on [' + SCHEMA_NAME(o.schema_id) + '].[' + o.name + '] ('
            end + cc.ColumnParts + ') ' +
            --INCLUDED COLUMNS
            case
                when inc.IncludedColumns is null then ''
                else 'INCLUDE (' + inc.IncludedColumns + ')'
            end +
            --FILTER DEFINITIONS
            case i.has_filter
                when 0 then ''
                else ' WHERE' + i.filter_definition
            end +   
            --COMPLETE WITH OPTIONS
            ' WITH (' +
                case i.is_primary_key
                    when 0 then
                        'SORT_IN_TEMPDB = OFF,' +
                        'ONLINE = ON,' +
                        'DROP_EXISTING = OFF,'
                    else ''
                end +
                'PAD_INDEX = ' + case i.is_padded when 1 then 'ON' else 'OFF' end + ',' +
                'STATISTICS_NORECOMPUTE = ' + case i.is_padded when 1 then 'ON' else 'OFF' end + ',' +       
                'IGNORE_DUP_KEY = ' + case i.ignore_dup_key when 1 then 'ON' else 'OFF' end + ',' +       
                'ALLOW_ROW_LOCKS = ' + case i.allow_row_locks when 1 then 'ON' else 'OFF' end + ',' +
                case i.fill_factor when 0 then '' else 'FILLFACTOR = ' + cast(i.fill_factor as varchar(3)) + ',' end +
                'STATISTICS_NORECOMPUTE = OFF)' +
            ' ON [' + fg.name + ']' as CreateSQL
        into #tempDef
        from sys.indexes i
        join sys.objects o on o.object_id = i.object_id
        join ConcatCols cc on cc.object_id = o.object_id
            and cc.index_id = i.index_id
        join sys.filegroups fg on fg.data_space_id = i.data_space_id
        left join includedCols inc on inc.object_id = i.object_id
            and inc.index_id = i.index_id;
           
           
        if @Action in ('Drop','SaveOnly')
        BEGIN
            insert into dbo.IndexDefinitions(IndexName,TableName,SchemaName,CreateSQL)
            select * from #tempDef;
        END;
       
        if @Action in ('Output')
        BEGIN
            Select * from #tempDef;
        END;
       
    END;
       
   
    --DROP ALL INDEX ON TABLE:
    if @Action in ('Drop', 'DropOnly')
    BEGIN
        select @SQL = (
            select case
                when i.is_primary_key = 1 then   
                    'ALTER TABLE [' + SCHEMA_NAME(o.schema_id) + '].[' + o.name +'] DROP CONSTRAINT ' + i.name + ';'
                else
                    'DROP INDEX [' + i.name + '] ON [' + SCHEMA_NAME(o.schema_id) + '].[' + o.name +'];'
            end
            from sys.indexes i
            join sys.objects o on o.object_id = i.object_id
            where o.object_id = @object_id
            for xml path ('')
        );       

        EXEC sp_executesql @SQL;
               
    END;
   
    if @Action = 'Create'
    BEGIN
        select @SQL = dbo.concatStrings(@TableName,@SchemaName);

        EXEC sp_executesql @SQL;   
       
        delete from dbo.IndexDefinitions
        where TableName = case @TableName when 'All' then TableName else @TableName end
            and SchemaName = case @TableName when 'All' then SchemaName else @SchemaName end;
       
    END;       
END;

And now for the function:

CREATE function [dbo].[concatStrings](@TableName nvarchar(128), @SchemaName nvarchar(128) = 'dbo')
returns nvarchar(max)

as

BEGIN
    declare @x varchar(max)
   
    select @x = coalesce(@x+';','') + CreateSQL
            from IndexDefinitions
            where TableName = case @TableName when 'All' then TableName else @TableName end
                and SchemaName = case @TableName when 'All' then SchemaName else @SchemaName end
           
    return @x;
END;

There you go.  Enjoy.  Any feedback is welcome.

*EDIT*
Here's an example:
exec dbo.DropCreateIndexes
    @action = N'Create',
    @TableName = N'dropIndexTest'

Saturday, 29 March 2014

Capturing identity ID's from an insert or update

So you need to insert data into multiple columns, all linked with identity columns for primary keys.  You need to be able to insert into one table, and pass on the value from the identity column to the foreign key of another table.  You can't use @@IDENTITY or SCOPE_IDENTITY() unless you do so in an implicit cursor, and I've seen plenty of this, but cursors like this are slow and cumbersome... and high on resources.  Such a script could take hours to run when writing it as a set based solution could run in a mere fraction of the time. Let's assume you want to avoid your script running like a tortoise through a pool molasses... so we won't be focusing on cursors.

Of course the data coming over will have ID's.  You could create a new column to store that ID, but then you're changing the schema and the developers might not be too happy about that.  You're also adding redundant data that you may forget to remove, and taking up more disc space than you need to.  I've done this when starting out myself, but there is a better way.

Use the OUTPUT clause to log to a temporary table, including the generated identity numbers in thew system as well as those of the old one, allowing you to map the old ID's to the new ID's.  A temporary table will persist for only as long as your session, so you don't have to worry about it taking up any disc space.  You won't change any schema's and it's pretty fast.  HOWEVER, it's important to note that a normal insert statement will not return the ID's of the source table to the output clause.  Output requires an inserted or deleted value from the destination table, or for the the column to be contained in a from clause belonging to the insert statement.  To do this we can use the MERGE statement.

Here is an example scenario.  The NSA are keeping track of our phone calls, who the caller and receiver is and if it's a call of interest.  They need to migrate this information out of their PRISM system into their new uber secret one running with SQL Server 2012, called TipTop.  We have a table for phone numbers and a call table that logs calls between these numbers with a flag if the call is of interest to the NSA.

Lets first create the tables we need, and populate the old PRISM tables with data.

--CREATE THE TABLES
create table #PrismNumbers(
NumberID int identity (100,1),
Number varchar(12),
BillPayerSurname varchar(25)
);

create table #PrismCalls(
CallID int identity(100, 1),
CallerID int,
ReceiverID int,
OfInterest char(1)
);

create table #TipTopNumbers(
TTNumberID int identity(200,1),
TTNumber varchar(12),
TTBillPayerSurname varchar(25)
);

create table #TipTopCalls(
TTCallID int identity(200, 1),
TTCallerID int,
TTReceiverID int,
TTOfInterest char(1)
);

create table #IDMap(
TTID int,
PrismID int
);

--POPULATE THE TABLES WITH DATA
insert into #PrismNumbers(Number,BillPayerSurname) values
('01999 897786','Smith'),
('01998 564453','Jackson'),
('01997 786675','White'),
('01996 231122','Johnson'),
('01999 345679','Williams');

insert into #PrismCalls(CallerID,ReceiverID,OfInterest) values
(100,104,'N'),
(103,102,'N'),
(101,104,'Y'),
(103,101,'Y');

Ok... now that we have our tables in place, we need to start our migration.  We first need to migrate the numbers, outputting to the IDMap table so we can insert the correct new ID's for the TipTop system.  This is where we use the merge statement:

merge into #TipTopNumbers as tt
using #PrismNumbers as p
on 1=2 --<-- Merge condition that will always be false
when not matched
then insert (TTNumber,TTBillPayerSurname) 
values (p.Number,p.BillPayerSurname)
output inserted.TTNumberID as TTIDs,p.NumberID as PrismIDs
into #IDMAP;


Now try "select * from #IDMap" to see the mappings of ID's  Now we can finally migrate the calls that reference the new TipTop number table.

insert into #TipTopCalls(TTCallerID,TTReceiverID,TTOfInterest)
select cm.TTID,rm.TTID,c.OfInterest
from #PrismCalls c
join #IDMap cm on cm.PrismID = c.CallerID
join #IDMap rm on rm.PrismID = c.ReceiverID;

And there you go.  You've now migrated data from Prism to TipTop without bringing over the old ID's. Bear in mind though that Merge is only available from 2008.  Happy migrating!

*Edit 30/03/2014*
I should mention more on how the Merge statement is used here.  Of course this isn't the intended use. We're just using it to do inserts in an ANSI standard format that allows us to use the output clause in the way we need.  To do this we ensure that the condition will always be false with a predicate of 1=2.  If you want to apply joins or a more complex query to define the source data, you can use a common table expression (CTE).

*Edit 31/03/2014*
Of course, maybe you don't need to output to a temporary table.  You could output directly the table you intend to migrate to.


Saturday, 22 March 2014

Start blogging!

I have two favourite quotes:
  • "If you can't explain it simply, you don't understand it well enough." - From one of the greatest minds, Albert Einstein.
This is so true.  Sometimes I may think I understand something, yet when I try to verbalise it I fall short.  So I've decided to blog (when my two little boys let me).  Since becoming a SQL developer, of sorts, I've found found that there is always so much more to learn, and so many people out there willing to give their time to help.  Without them I probably wouldn't be in the position I am today (thanks guys).  Of course I also realise that they do this to also help themselves, to put their skills out where people can see it and also, very much, to learn.  This is my attempt to join that community, as I want to learn and share as much as I can.

  • "Knowing is not enough, we must apply. Willing is not enough, we must do." - from the late and great Bruce Lee
Always an inspirational quote.  It makes you want to get up off your butt and do it!  I often learn something about SQL that I just won't ever get a chance to do at work.  I've often thought about blogging, but never got started.

This blog will concentrate mainly on SQL Server, with maybe a smattering of Oracle.  Now just one thing remains..... where the hell do I start?  


Deploying out of hours shouldn't be a thing

Getting code into production should not be hard. Nor should it have to be scheduled for special times. I have recently been involved with th...