Usually when a delimited lists is need in T-SQL, XML Path has been the solution for generating a this delimited list in a single column. With the generation of XML, comes the concatenation needed to add the delimiter and the trimming function to remove the leading or trailing delimiter. Using AdventureWorks2017 database, the following snippet shows how a typical XML Path lists is implemented.

SELECT DISTINCT P.PersonType, 
                Business.ids 
FROM   Person.Person P 
       CROSS APPLY (SELECT Stuff((SELECT Concat(',', PID.businessentityid) 
                                  FROM   Person.Person PID 
                                  WHERE  P.PersonType= PID.PersonType 
                                  FOR XML PATH ('')), 1, 1, '') IDs)Business 

The well known XML Path

As shown above, XML Path can provide a way creating a delimited list, but it also requires using a SUBSTRING() or STUFF() function to remove trailing or leading delimiters generated. The main problem with XML Path resides on the execution plan. Let’s take a look on the execution plan generated for the query above.

Results, Stats and Execution

Results
Statistics Time and IO

As you can see above, the XML Path, generated an execution plan that requires a UDX operator and a Lazy Spool that created 77 MBs of data to be reduced to 24kbs of distinct data on the final output. This option took 74 seconds to return data, in any OLTP, that’s unnaceptable. Also the reads of this operation where very high at 80k logical reads and 193k LOB physical reads.

The Alternative

Starting in SQL 2017, Microsoft introduced the very neat function STRING_AGG. This function works like a concatenation function, but allows SQL to aggregate columns and delimiters just like XML Path will do in a much efficient way. Let’s take a look.

SELECT DISTINCT PersonType, 
                STRING_AGG(Cast(BusinessEntityID AS NVARCHAR(MAX)), ',')
                within  GROUP (ORDER BY BusinessEntityID ASC)
FROM   Person.Person 
GROUP  BY PersonType

Results, Stats and Execution

Results
Statistics Time and IO

As shown above, the STRING_AGG did not generated a UDX operator neither a Lazy Spool. But what does this means? The UDX operator is gone since no XML was used by the STRING_AGG and the lazy spool is gone because SQL did not created a temporary data-set to evaluate the data to create a delimited list. Also no hash match operator was created and a Stream Aggregate operator was used for the Group By portion of the script. There are some warnings on the execution plan due to implicit conversions (BusinessEntityID INT to NVARCHAR). Also, this version of the script only took 75ms to execute. This is 100 times faster than the XML Path implementation with only 37 logical reads. This function also allowed to generate this delimited column in-line without the need of derived queries.

Wrap Up

As this article shows, if you need a delimited list in TSQL and your product is using SQL 2017 or higher, STRING_AGG function is a more clean and performant way of obtaining it. It can be used for regular concatenations but has the ability of looping through a data set with an aggregate and create this delimited columns. Enjoy!

Leave a comment