Conducting Generalizability Studies with Unbalanced Data

Data

In a recently published article in AIR Professional File (Sturgis, Marchand, Miller, Xu, and Castiglioni, 2022), I, along with my co-authors, introduce generalizability theory (G-theory) to institutional research (IR) practitioners, and provide instructions on how to conduct a G-theory study using Statistical Analysis Software (SAS). However, one issue that we did not discuss was how to effectively handle data that is unbalanced. The purpose of this article is to discuss how to conduct a G-theory study when you are working with unbalanced data. I will first differentiate between balanced and unbalanced data. Then I will discuss what information you can and cannot get from the SAS output when conducting a G-theory study with unbalanced data using the procedure discussed in the Sturgis et al. (2022) article. This will be followed by a discussion of two approaches to conducting a G-study and D-study when working with unbalanced data, which are to delete data until your data set is balanced, or to utilize specialized software, such as G String VI.

Balanced Vs. Unbalanced

In a balanced data set, the number of observations “is equal at each level for the source of variability” (Teker, 2019, p. 59). The data set that we (Sturgis et al., 2022) utilized in our earlier article was balanced. Each student completed all six cases on the exam, and was evaluated by the same number of raters in each case. Because of this, we were able to utilize the variance components that we calculated in the G-study in order to conduct a decision study (D-study) in which we discussed how varying the number of cases would impact the generalizability of the assessment (Sturgis et al., 2022, pp. 8-9).

In contrast, an unbalanced data set “has unequal numbers of observations in its sub-classifications” (Teker, 2019, p. 59). To use our example above, if some students did not complete all cases on the exam, and/or if the students were evaluated by different numbers of raters, then our data would have been unbalanced. With an unbalanced data set, we would have been unable to conduct the D-study, as we would have been unable to calculate the D-study variance components.

Unbalanced Data with the SAS Approach

As discussed above, if the data that we used in our article had been unbalanced, we would have been unable to calculate the D-study variance components. Additionally, we would have been unable to calculate an overall generalizability coefficient (Ep2). However, just because you are unable to conduct a D-study or calculate an overall generalizability coefficient with unbalanced data when utilizing the SAS approach, does not mean you cannot utilize a G-theory analysis in order to draw useful conclusions from your data.

The approach that we describe in our earlier article (Sturgis et al., 2022) can still be used with unbalanced data in order to estimate the G-study variance components, which can often go a long way toward answering your overall research question. For example, one of our goals in the earlier analysis was to evaluate if the raters were evaluating the students in a consistent manner. Our analysis determined that approximately 14 percent of the overall variance was attributable to the case nested within the rater. We determined that this was more than optimal, however, since 1) more than twice as much variance was attributable to student * rater, 2) we concluded that the raters were evaluating the students in a fairly consistent manner, and 3) that the results indicated that students were demonstrating different levels of mastery across the different cases. If our analysis had instead shown that the majority of the variance was attributable to the rater, we would have concluded that too much of the variance was attributable to the raters for the assessment to be considered reliable. Thus, even when you are unable to conduct a D-study or calculate an overall generalizability coefficient due to the unbalanced nature of your data, G-theory can still be used to draw useful conclusions from the data.

Two Approaches

There are two major approaches to dealing with unbalanced data when you wish to conduct both a G-theory analysis and a D-study. One approach is to delete data until a balanced design is achieved. For example, if some the students in our earlier study had only completed five of the six cases, and some of those students were only evaluated by three raters, we could have deleted excess data until we had a data set in which all students were evaluated on five cases by three raters. After that was completed, we would have proceeded to analyze the data using the same procedures that we utilized in analyzing our six cases by five raters design. If you are analyzing data in SAS and need to conduct a D-study, or calculate an overall generalizability coefficient (Ep2), this is your only choice. The obvious disadvantage to deleting data is that it reduces the sample size, which is particularly problematic if the final data set will contain less than 50 cases (Atilgan, 2013).

The second alternative to dealing with unbalanced data in G-theory is to utilize a specialized software package, such as G String VI. G String VI is based on the urGENOVA program, developed by Robert Brennan, a prominent researcher in the area (Brennan, 2001), and runs on all major platforms. Detailed instructions for G String can be found in the user’s manual (Block and Norman, 2018; Block and Norman, 2021) as well as in Teker’s 2019 article.

Although G String is designed to analyze unbalanced data, there is a caveat, and that is there is a limit to how unbalanced the data set can be. G String can effectively deal with a design in which, for example, different students rate a different number of teachers on a scale, where some students rate 10 teachers, other students rate 15 teachers, etc. (Block and Norman, 2018, pp. 36-38). However, G String cannot effectively analyze data if it is “extremely” unbalanced.

For example, let us imagine that you are analyzing faculty evaluations of students in six different clerkships. In the real world, you might have the majority of students completing all six clerkships, whereas some students only complete one or two of the clerkships. Additionally, unless a statistician designed the assessment procedure, student “A” might have been evaluated by two raters in the first clerkship, five raters in the second clerkship, seven raters in the third clerkship, etc., while student “B” was evaluated by five raters in the first clerkship, two raters in the second clerkship, 12 raters in the third clerkship, etc. If your data set resembles this, then it is too unbalanced to be analyzed with G String, and you will have to either delete data until you have a balanced design, or limit your analysis to calculating the G-study variance components.

Conclusion

G-theory is a useful analytic tool, and institutional researchers can utilize it to answer many research questions related to student learning outcomes. Institutional researchers should be prepared to analyze unbalanced data, as unless they were involved with the project from the beginning, the majority of the data they encounter is likely to be unbalanced. This article has shared how the SAS procedure discussed in the earlier article (Sturgis et al, 2022) can be utilized to answer many G-theory related research questions when working with unbalanced data. Additionally, we discussed two approaches to conducting a G-theory study which allow you to conduct both a G-theory analysis and a D-study: deleting data until a balanced design is achieved, and using a specialized software program such as G String VI.

References

Atilgan, H. (2013). Sample size for estimation of g and phi coefficients in generalizability theory. Egitm Arastirmalari-Eurasian Journal of Education Research, 51, 215-228.

Block, Ralph & Norman, Geoff. (2018). G string v user manual. Available at: https://healthsci.mcmaster.ca/docs/librariesprovider15/default-document-library/g-string-v5-user-manual.pdf?sfvrsn=9c6f7395_2

Block, Ralph & Norman, Geoff. (2021). G string vi user manual. Available at: https://github.com/G-String-Legacy/G_String/blob/main/Support/Manual.pdf

Brennan, R. L. (2001). Generalizability theory. New York: Springer-Verlag.

Sturgis, P. Marchand, L., Miller, M., Xu, W., and Castiglioni, A. (2022). Generalizability theory and its application to institutional research. AIR Professional File, Spring 4-13. https://doi.org/10.34315/apf1562022

Teker, Gulsen T. (2019). Coping with unbalanced designs of generalizability theory: G string v. International Journal of Assessment Tools in Education 6(5), 57-69 https://doi.org/10.21449/ijate.658747.

King Paul W. Sturgis a statistical analyst in the Department of Analysis, Planning, and Accreditation at the University of Central Florida’s College of Medicine. In his current role, he focuses on conducting statistical analyses of institutional and assessment related data. He can be reached at paul.sturgis@ucf.edu.