Thursday, August 20, 2009

Pivoting Data Using SQL 2005 (Part 2)

This is part two of a three part series focusing on pivoting data, using SQL Server 2005. In part one of the series, I focused on creating a static pivot using the pivot operator, http://jahaines.blogspot.com/2009/08/pivoting-data-using-sql-2005-part-1.html. In this post, I will be focusing on creating a dynamic pivot where the pivot values are not known.  I choose to use a differing sample of data than the first part of this series.  The code in the first part became too complex for simple demonstration.  The code became complex due to my having to calculate column quarters values.  I have attached the dynamic pivot solution, for the part one data, which can be found at the end of this article.  Let’s get started with our new table and data.

Table/Data

SET NOCOUNT ON
GO
 
IF OBJECT_ID('tempdb..#Orders') IS NOT NULL
BEGIN
    DROP TABLE #Orders;
END
GO
 
CREATE TABLE #Orders(
OrderId INT IDENTITY(1,1) PRIMARY KEY,
CustId INT NOT NULL,
ProductDescr VARCHAR(50),
Sales_Dt DATETIME,
Sales_Amt NUMERIC(9,2)
);
GO
 
INSERT INTO #Orders VALUES (1,'Product1','2008-01-01',0.99);
INSERT INTO #Orders VALUES (2,'Product1','2008-04-01',1.99);
INSERT INTO #Orders VALUES (1,'Product3','2009-01-01',100.25);
INSERT INTO #Orders VALUES (1,'Product4','2009-02-01',999.99);
INSERT INTO #Orders VALUES (2,'Product2','2009-01-01',99.99);
INSERT INTO #Orders VALUES (2,'Product1','2009-01-03',107.20);
INSERT INTO #Orders VALUES (3,'Product4','2009-01-03',459.36);
INSERT INTO #Orders VALUES (3,'Product3','2009-05-01',19.99);
INSERT INTO #Orders VALUES (3,'Product5','2009-06-01',29.99);
GO

Now that we have our table, let’s start coding.  The driving force behind the code below is the variable assignment.  We build the column/value list dynamically by creating a process that appends distinct values to an existing string.  Once we have the column list, we just have to create the same query that we created in part one of this series and append the variables.  I created two variables to house the columns.  The variable @Cols is used to create our select list which ultimately is the value wrapped in the coalesce function.  For example, COALESCE([Product1],0) AS [Product1],COALESCE([Product2],0) AS [Product2]…..  The Second variable @Pivot_Cols is used to store the column metadata, in the format [Product1],[Product2],…  There are many different ways to accomplish variable assignment, but I tend to like the COALESCE method.  I will show you another method later on.

DECLARE @sql NVARCHAR(MAX),
        @cols VARCHAR(MAX),
        @Pivot_Cols VARCHAR(MAX)
    
--The code below reads through all the data in the table and
--builds a distinct column list.
SELECT 
    @cols = COALESCE(@cols + ',','') 
        + 'COALESCE(' + QUOTENAME(ProductDescr) + ',0) AS ' + QUOTENAME(ProductDescr),
    @Pivot_Cols = COALESCE(@Pivot_Cols + ',','') + QUOTENAME(ProductDescr)
FROM #Orders
GROUP BY QUOTENAME(ProductDescr)
ORDER BY QUOTENAME(ProductDescr) ASC
 
SELECT @sql = N'
SELECT 
    CustId,' + @Cols + '
FROM(
    SELECT 
        CustId,
        ProductDescr,
        Sales_Amt    
    FROM #Orders
    WHERE
        Sales_Dt >= ''2008-01-01''
        AND Sales_Dt < ''2010-01-01''
) AS dataToPivotTable
PIVOT(
    SUM(Sales_Amt) --Aggregate Function
    FOR [ProductDescr]-- Column To Pivot
    IN (' + @Pivot_Cols + ') --Values to Pivot
) AS whatToPivotOnTable;'
 
PRINT @sql
EXEC sp_executesql @sql

Note: I hard coded the date filters, but I advise you too use supply the date parameters using sp_executesql, as this will allow you to help parameterize the query.

Results:

image

We can also expand on our SQL 2000 solution, as shown below.

DECLARE @Cols VARCHAR(MAX)
DECLARE @sql NVARCHAR(MAX)
 
SELECT @cols = 
    MAX(COALESCE(@cols + ',','') 
    + N'SUM(CASE WHEN ProductDescr = '
    + QUOTENAME(ProductDescr,'''') + ' THEN Sales_Amt ELSE 0 END) AS ' 
    + QUOTENAME(ProductDescr))
FROM #Orders
GROUP BY QUOTENAME(ProductDescr)
ORDER BY QUOTENAME(ProductDescr) ASC
 
SET @sql = N'SELECT CustId,' + CHAR(13)
      + @Cols + CHAR(13) 
      + 'FROM #Orders' + CHAR(13) 
      + 'WHERE Sales_Dt >= ''2008-01-01'' AND Sales_Dt < ''2010-01-01'' ' + CHAR(13) 
      + 'GROUP BY CustId'
 
PRINT @sql
EXEC sp_executesql @sql

As promised, here is another method to solve variable assignment.

SELECT @cols =  
    STUFF((
        SELECT DISTINCT TOP 100 PERCENT 
            ',' + N'SUM(CASE WHEN ProductDescr = '
            + QUOTENAME(ProductDescr,'''') + ' THEN Sales_Amt ELSE 0 END) AS ' 
            + QUOTENAME(ProductDescr)
        FROM #Orders
        ORDER BY 
            ',' + N'SUM(CASE WHEN ProductDescr = '
            + QUOTENAME(ProductDescr,'''') + ' THEN Sales_Amt ELSE 0 END) AS ' 
            + QUOTENAME(ProductDescr) ASC
        FOR XML PATH (''))
        ,1,1,'')

As you can see, the dynamic pivot is fairly easy to implement.  It is much easier to dynamically pivot data when you do not have to calculate your column values. If you are wondering why I did not show solve the example from the last post (Maybe because you did not read the beginning of this post :-) , I choose not to post this solution because it is a little more complex because you have to calculate the column value; however, I will be attaching the solution via my sky drive.   I hope that you are still awake and ready for more pivoting fun because in part three of this series I am taking dynamic pivoting to the extreme.  In part 3, I will be making every single aspect of the pivot dynamic, including using column names that are stored as rows in a table.

Until next time, happy coding.

Dynamic solution – part1 data: http://cid-6f041c9a994564d8.skydrive.live.com/self.aspx/.Public/Dynamic%7C_Pivot/DyamicPivot.sql

2 comments:

Brad Schulz said...

Good post, Adam... I'm looking forward to Part 3.

By the way, I don't think you need the TOP 100 PERCENT in that final (XML PATH) query, since you don't use that query in a derived table or use it as a view. I think just the DISTINCT and ORDER BY are all that are needed.

In fact, now that I think about it, the ORDER BY probably isn't needed either, because, in order to do the DISTINCT, SQL is going to have to sort the data anyway.

daspeac said...

I believe you may also know about the Excel recovery tool