hckr.fyi // thoughts

Using Pivot in SQL to Shape Data

by Michael Szul on

SQL Server has a lot of useful functions, commands, keywords, etc. that many of us don't use on a regular basis unless we're doing business intelligence and analyzing data. Pivot is one of those keywords.

With the recent launch of the online testing system that my team and I built for the University of Virginia's School of Medicine, there is a lot of reporting that gets looked at: exam results, raw exam metrics, binary data, individual student performance, and individual student answer choices. This last one required some trickery. They wanted to see each student's selected choice (letter designation for each question); meanwhile, our database stores their responses individually, as they are associated with questions on exams. It's easy to run a report that has a record for each answer for each question for each student, but that would produce ten lines per student (one for each question response). What you need is a column running down the page with each student name, but the questions running across the page with each designation under them. Pivot helps you accomplish this.

With Pivot, you are shifting the table horizontally for a particular column, while establishing the value to be placed in the pivoted columns. This will have to be an aggregate, so in most cases, it's a count or a sum, but if your expected result is not a number (such as with alphabetical designations for answers to questions), you can use Max on the column you're dispersing, so that you are essentially getting the max of a single record.

Below is an example of Pivot for questions and answer designations:

SELECT p.StudentName, p.DisplayOrder
    FROM 
    (
        SELECT
           a.StudentName
          ,e.DisplayOrder
          ,a.Designation
          FROM Exam e
            INNER JOIN Answers a
                ON e.QuestionID = a.QuestionID
        ) r
    PIVOT
    (
      MAX(Designation) FOR DisplayOrder  IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10])
    ) p
    

In the above example, you'll get 10 columns across numbered 1-10 (for questions 1-10), and the student names running down the first column of the result set (so 11 columns total). Intersecting them will be the designation selected by the student. In the above example, it assumes 10 questions, which is not always the case. For added trickery, you can actually dynamically write and then execute this as a stored procedure based on the number of questions in the exam. You can even do it directly in another stored procedure (executing a dynamic one from a static one) if you're feeling up to it.

Note that Pivot essentially changes the alias from which you are selecting, so that even though the result set above is aliased as r, after the pivot is made, the selection criteria at the top of the statement must reflect the alias of the Pivot, which is p in this example.