Article

  • Tech Tips
  • 07.12.19

Using the Listagg Function in SQL

  • by Jinny Case, Senior Research Analyst and Ashwin Jayagopal, Institutional Research Analyst, University of Texas at San Antonio

Our institutional research analysts frequently extract data and modify the structure of data using SQL to suit the needs of data consumers or to ease analysis. Often, we need to aggregate tables containing multiple rows of data per student or faculty member to one row. Typically, the max, min, or some other aggregating function is used to gather only the necessary data into one row, but how do you handle a situation in which all of the data is needed? This is where the listagg function proves useful.

We use active athletics status as an example here. Student athlete data are considered part of public directory information. Upon receiving public information office (PIO) requests for directory data, we use the listagg function in our PIO code to ensure we include a description of all sports played while maintaining one record per student.

This query joining the Sport Information table in Banner to the Sport validation table serves as our base subquery to extract athletes. Here, we select the person identifier along with term code and sport for all active and fifth year athletes.

TT-July-2019a

In rare instances, student athletes may play more than one sport resulting in duplicate rows for the query. Here is an example of what the extracted data might look like in the case of multi-sport athletes. 

TT-July-2019b

Since we want only one row for each unique student showing all sports played in a given time period, we need to modify the query to use the listagg function, which aggregates the data into one row, separating the data in multiple rows with the delimiter of your choice. Again, we use SGRSPRT as an example:

TT-July-2019c

The query above is aggregating by student identifier to bring in multiple sports for each term, separating the sports with a comma and one space. We will still retain duplicates for athletes playing in multiple terms, but that is all right since we will join this to enrollment data by PIDM and term. If we needed to show sports in the same row in a specific order, say alphabetical, we would simply add STVACTC_DESC in the ORDER BY statement after PIDM. Here is the result of the modified query using the listagg function.

TT-July-2019d

Another example of using the listagg function is when we want to display a student and all the grades that the student received for the courses taken in a term. In a specific term, a student may take more than one course and if we want to see a single record displaying student’s details and their respective grades, we can merge all the grades into a list separating it with any specific delimiter.

Please see the query below :

TT-July-2019e

TT-July-2019f