Wednesday 21 May 2014

Recursive CTEs

One feature that I've found more and more useful (2005+) is the Common Table Expression (CTE).  A CTE essentially allows you to create a temporary table, much like a derived table, but can be reference multiple times and is easier to read and script.  But it has the added bonus of being able to reference itself in a recursive way.

As an example scenario, let's say we have a table that hold folders or areas that other records can be linked to, a document record, for example.  Now we want to be able to have as many folders and sub-folders as we like... so we contain these in a table like so:


Use the below to create this table as a temporary table.

create table #folders (id int identity(1,1),FolderName varchar(20),parentID int);

insert into #folders(FolderName,parentID) values
('Top1',0),
('Top2',0),
('Mid1',1),
('Bot1',3),
('Mid2a',2),
('Mid2b',2),
('NextMid2a',5),
('Bot2a',7);
Now lets say we have a document assigned to folder ID 8 (Bot2a).  We want to be able to show the full path that the folder is contained in, but that information isn't found in the #folders table.  We are able to trace it back though.


Looking at my extremely straight arrows, we can see that each folder is assigned a parent folder, with the top level folders having a parentID of 0.  So we can see that the full folder path for a document in Bot2a will be "Top2\Mid2a\NextMid2a\Bot2a."  So how do we get the path without writing a convoluted while loop?  Enter the recursive CTE.

Just like any normal CTE, it will have a query contained in the WITH section, however this time the must consist of two unioned select statements.  The first one will be the Anchor statement, containing only the top level records, and the second statement will be the recursive one, excluding the root folders.  Here:

with CTE as
(
    select ID,folderName,parentID,cast(folderName as varchar(250)) as path,1 as PathLevel
    from #folders
    where parentID = 0


    union all

    select f.id,f.foldername,f.parentID,cast(cte.path + '\' + f.folderName as varchar(250)),cte.Pathlevel + 1
    from #folders f
    join cte on cte.ID = f.parentID
    where f.ID <> 0

)
select * from CTE;
Running this will give you the following results:


So you can see each folder is presented with its full path, up to and including itself.  Let's process what we just did.  In the statement highlighted in orange, we simply queried to return only the top level folders.  We also created a Path and PathLevel column.  For the anchor records, the full path will be equal to just the folder name, and we'll assign PathLevel a value of one to represent at what level in a path the current folder sits in.

In the recursive statement highlighted in blue, you can see we joined the CTE itself!  This is where we link the current record to its parent in the anchor statement.  We can then refer to the previous iteration and build on the Path string value, or add a 1 for the level.

Pretty easy.  Pretty quick.

Bear in mind that this can be an expensive query with large set of data with large numbers of recursions.  Also, if a mistake is made with your joins you could potentially end up in an in infinite loop.  Or at least you would if SQL Server didn't enforce a default limit of 100 recursions.  This essentially means that (in this example) you can't get more than 100 folders in your path. If you exceed this you will be presented with the following error:

Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

If you want to see this for yourself, simply change the join to "join cte on cte.ID = cte.ID"  to create an infinite loop.  You can change the default limit though by simply adding the following option at the end of your full statement,"OPTION (MAXRECURSION n)" where of course n is whatever value you choose.  Try changing the MAXRECURSION limit to 200 and running the CTE with the infinite loop again.

What other uses for a recursive CTE can you think of?





No comments:

Post a Comment

Deploying out of hours shouldn't be a thing

Getting code into production should not be hard. Nor should it have to be scheduled for special times. I have recently been involved with th...