Wednesday 21 May 2014

Recursive CTEs

One feature that I've found more and more useful (2005+) is the Common Table Expression (CTE).  A CTE essentially allows you to create a temporary table, much like a derived table, but can be reference multiple times and is easier to read and script.  But it has the added bonus of being able to reference itself in a recursive way.

As an example scenario, let's say we have a table that hold folders or areas that other records can be linked to, a document record, for example.  Now we want to be able to have as many folders and sub-folders as we like... so we contain these in a table like so:


Use the below to create this table as a temporary table.

create table #folders (id int identity(1,1),FolderName varchar(20),parentID int);

insert into #folders(FolderName,parentID) values
('Top1',0),
('Top2',0),
('Mid1',1),
('Bot1',3),
('Mid2a',2),
('Mid2b',2),
('NextMid2a',5),
('Bot2a',7);
Now lets say we have a document assigned to folder ID 8 (Bot2a).  We want to be able to show the full path that the folder is contained in, but that information isn't found in the #folders table.  We are able to trace it back though.


Looking at my extremely straight arrows, we can see that each folder is assigned a parent folder, with the top level folders having a parentID of 0.  So we can see that the full folder path for a document in Bot2a will be "Top2\Mid2a\NextMid2a\Bot2a."  So how do we get the path without writing a convoluted while loop?  Enter the recursive CTE.

Just like any normal CTE, it will have a query contained in the WITH section, however this time the must consist of two unioned select statements.  The first one will be the Anchor statement, containing only the top level records, and the second statement will be the recursive one, excluding the root folders.  Here:

with CTE as
(
    select ID,folderName,parentID,cast(folderName as varchar(250)) as path,1 as PathLevel
    from #folders
    where parentID = 0


    union all

    select f.id,f.foldername,f.parentID,cast(cte.path + '\' + f.folderName as varchar(250)),cte.Pathlevel + 1
    from #folders f
    join cte on cte.ID = f.parentID
    where f.ID <> 0

)
select * from CTE;
Running this will give you the following results:


So you can see each folder is presented with its full path, up to and including itself.  Let's process what we just did.  In the statement highlighted in orange, we simply queried to return only the top level folders.  We also created a Path and PathLevel column.  For the anchor records, the full path will be equal to just the folder name, and we'll assign PathLevel a value of one to represent at what level in a path the current folder sits in.

In the recursive statement highlighted in blue, you can see we joined the CTE itself!  This is where we link the current record to its parent in the anchor statement.  We can then refer to the previous iteration and build on the Path string value, or add a 1 for the level.

Pretty easy.  Pretty quick.

Bear in mind that this can be an expensive query with large set of data with large numbers of recursions.  Also, if a mistake is made with your joins you could potentially end up in an in infinite loop.  Or at least you would if SQL Server didn't enforce a default limit of 100 recursions.  This essentially means that (in this example) you can't get more than 100 folders in your path. If you exceed this you will be presented with the following error:

Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

If you want to see this for yourself, simply change the join to "join cte on cte.ID = cte.ID"  to create an infinite loop.  You can change the default limit though by simply adding the following option at the end of your full statement,"OPTION (MAXRECURSION n)" where of course n is whatever value you choose.  Try changing the MAXRECURSION limit to 200 and running the CTE with the infinite loop again.

What other uses for a recursive CTE can you think of?





Wednesday 7 May 2014

Understanding locks part 2 - Types of Locks

In my last post we saw how to find and determine locks on your resources.  But what are the different types of locks and what do they do?

The two main ones would be exclusive (X) and shared (S).  An exclusive locks will lock that resource to prevent any other transaction from either writing to or reading from that resource, while a shared lock will allow others to read, but not write or update.

So multiple read locks can be made on the same resource.  I guess that's why they called it shared.  Here's a simple table showing which simultaneous locks are allowed.

S
X
S
Yes
No
X
No
No

So if you have a exclusive lock, you have to wait for that transaction to complete, before your shared lock can take place.  Or vice versa.  But if you have a shared lock first, and the second one is a shared lock as well, no blocking occurs.
Another type of lock are the intent locks.  The main flavours will be intent shared (IS) and intent exclusive (IX).  
Think of these as a kind of marker, placed at the top of a hierarchical structure.  If a lock is required at row level then a shared lock, for example, could be placed on the row.  This would mean that if an exclusive lock is required on the same row, a lot of resource would be wasted in finding that row, only to find out it's already locked.  What you could do is place a marker at the table level with details about the row lock.  This saves the next transaction needlessly traversing the hierarchy to determine if a lock at row level is already in place.
Let's see it in action.  In AdventureWorks (2008 or 2012), begin a transaction with an update:
begin transaction updateit1;
update person.Person set FirstName = 'Kenny' where BusinessEntityID = 1;
In a separate query window / connection, run the query to find locks posted as in my previous post, Understanding locks part 1 - Finding locks and blocks.  You would expect to see row or key (for indexed columns) level blocks for this as we're updating only one row.  In this case you should see an IX lock on the table, and again on at least one page, finally followed by an X lock on the resource.  As you can see the details on the final actual exclusive lock trickle down through the hierarchy of the index pages. 
Now in yet another query window, run the following select statement:
begin transaction selectit1;
select * from person.Person where BusinessEntityID = 1;
If you check the locks again, you will now see something similar for S and IS locks. Same principle applies.  Now try running the same update statement in yet another window:
begin transaction updateit2;
update person.Person set FirstName = 'Kenny' where BusinessEntityID = 1;
Check the locks again and you will see anew type of lock. U and IU for update and intent update.  An update locks is simply a type of exclusive lock that will become a proper exclusive lock once it has finished being blocked.  If you now commit or rollback updateit1, the exclusive locks for that transaction will vanish, as will the shared locks, and the update locks from updateit2 will change to exclusive locks.  Commit or rollback updateit2 to clear down all locks.
Further details can be gleaned from http://technet.microsoft.com/en-us/library/aa213039(v=sql.80).aspx

Deploying out of hours shouldn't be a thing

Getting code into production should not be hard. Nor should it have to be scheduled for special times. I have recently been involved with th...