Volume 1, May 26, 2004 |
|
Chau-Kuang Chen, Meharry Medical College Click here to download this article in PDF. The ordinal regression method was used to model the relationship between the ordinal outcome variable, e.g., different levels of student satisfaction regarding the overall college experience, and the explanatory variables concerning demographics and student learning environment in a predominantly minority health sciences center. The outcome variable for student satisfaction was measured on an ordered, categorical, and four-point Likert scale--'very dissatisfied', 'dissatisfied', 'satisfied', and 'very satisfied'. Explanatory variables included two demographics, e.g., gender and ethnic groups, and 42 questionnaire items related to the satisfaction of faculty involvement, curriculum contents, support services, facilities, and leisure activities at the college. The major decisions involved in the model building for ordinal regression were deciding which explanatory variables should be included in the model and choosing the link function (e.g., logit link or complementary link) that demonstrated the model appropriateness. In addition, the model fitting statistics, the accuracy of the classification results, and the validity of the model assumption, e.g., parallel lines, were essentially assessed for selecting the best model. The research findings indicated that explanatory variables such as faculty competence and student-faculty relations were significantly associated with the satisfaction of the overall college experience. This discovery suggests that faculty members have played a major role in creating a pleasant environment to facilitate student satisfaction. In addition, the curriculum content regarding health promotion and disease prevention was significantly associated with the satisfaction of the overall college experience. It may also provide strong evidence that a specific component of the medical curriculum addressed student needs and contributed to the fulfillment of the medical college goal, e.g., delivery of primary care through health promotion and disease prevention. There has been an increasing emphasis on the study of student satisfaction in colleges and universities in America based on the notion that students have needs and rights to participate in quality programs and to receive satisfactory services. The satisfaction surveys provide colleges and universities with real pictures of the key issues perceived by their students. Consequently, the satisfaction results from the questionnaire surveys have been used as feedback information to help college administrators and faculty enhance the quality of programs and services. Different statistical methods used to analyze satisfaction data yield results with different focuses. These methods include descriptive statistics, chi-square, linear regression analysis, multilevel modeling, and ordinal regression techniques. Descriptive statistics, e.g., means, frequencies, and proportions of student responses are often applied to detect the most and the least satisfaction items regarding college programs and services (Cooney, 2000; Damminger, 2001; and Wild, 2000). Chi-square method is used to identify the significant proportion difference for student satisfaction response based on student retention group (Bailey, Bauman, and Lata, 1998). Regression methods such as linear, logistic, and ordinal regression are useful tools to analyze the relationship between multiple explanatory variables and student satisfaction results (Thomas and Galamos, 2002; and Hummel and Lichtenberg, 2001). The regression methods are capable of allowing researchers to identify explanatory variables related to academic programs and services that contribute to the overall college satisfaction. These methods also permit researchers to estimate the magnitude of the effect of the explanatory variables on the outcome variable. Therefore, regression methods seem to be superior in studying the relationship between the explanatory and outcome variables. Despite the prevalence of linear and logistic regression analyses, researchers are experiencing the challenge of using ordinal regression analysis to study the ordinal outcome because in part, they have not been fully exposed to the mathematical theory and the application software. Nowadays, the availability of statistical software routines in the Statistical Package for the Social Sciences (SPSS) or the Statistical Analysis System (SAS) makes it computationally possible to build an ordinal regression model. The application of linear, logistic, and ordinal regression methods depends largely on the measurement scale of the outcome variables and the validity of the model assumptions. The outcome variables include continuous scale, (e.g., total satisfaction scores), binary measure (e.g., satisfaction and dissatisfaction ratings), or ordered category (e.g., very dissatisfied, dissatisfied, satisfied, and very satisfied). Linear regression analysis is applicable to the outcome variable measured on a continuous scale while logistic regression analysis works well only for the binary or dichotomous outcome. In linear and logistic regression analyses, the model assumptions of normality and constant variance for the residual and the outcome data points need to be satisfied to demonstrate their appropriateness. If researchers wish to study the effects of explanatory variables on all levels of the ordered categorical outcome, an ordinal regression method must be appropriately chosen to obtain the valid results. More examples of ordinal outcomes include certain psychological measurement (e.g., levels of anxiety or depression), rank scores (e.g., letter grades of the course work), and the most frequently used Likert-scale (e.g., "poor", "fair", "good", and "excellent" ratings). It is implausible to assume the normality and homogeneity of variance for ordered categorical outcome when the ordinal outcome contains merely a small number of discrete categories. Thus, the ordinal regression model becomes a preferable modeling tool that does not assume the normality and constant variance, but require the assumption of parallel lines across all levels of the categorical outcome. The step-by-step procedures for building, evaluating, and interpreting the ordinal regression model were illustrated in this study. Essentially, the study followed four sequential protocols to create a workable model. First of all, the potential explanatory variables were examined to determine if they should be included in the model. Second, the outcome variable was coded or labeled as ordered, ranked, and categorical values. The explanatory variables were either a continuous or a discrete measure. Third, the complete and the reduced models along with the logit link and the complementary log-log (cloglog) link were used to generate the candidate models. The complete model contained all the explanatory variables while the reduced model included a subset of the predetermined explanatory variables. The logit and the cloglog links were chosen to build models based on the distribution of ordinal outcome, either evenly distributed among all categories or clustered around higher categories. Finally, the best model was chosen among all candidate models based on the model fitting statistics, the accuracy of the classification results, the validity of the model assumption, and the principle of parsimony. Clearly, the ordinal regression is a unique modeling technique in that the outcome variable is measured on the ordered categorical scale, various link functions are readily available to apply, and the validity of the model assumption for parallel lines is essentially assessed. In this study, the ordinal regression model was constructed to explore and examine the relationship between the satisfaction of overall college experience and the explanatory variables concerning demographics and the satisfaction ratings of student learning environment. The study results could lead to a better understanding of the satisfaction of college programs and services from student perspectives. The research question might be formulated as "How well can the satisfaction of the overall college experience be accounted for by the explanatory variables concerning college-learning environment?" The outcome variable of interest was the satisfaction of overall college experience, with a four-level ordinal measure such as "very satisfied", "satisfied", "dissatisfied", and "very dissatisfied". Explanatory variables included two demographics, e.g., gender and ethnic groups, and 42 satisfaction questionnaire items related to faculty involvement, curriculum contents, support services, facilities, and leisure activities in college. Using the ordinal regression method, researchers could identify the significant explanatory variables with their control to enhance student satisfactions regarding college-learning environment. The ultimate goal of the study was to make recommendations to enhance faculty involvement, curriculum contents, and support services as appropriate in the light of the research findings. For decades, researchers in higher education have assessed student satisfaction in three different justifications. First, most researchers have measured solely the levels of student satisfaction in order to identify the most and the least satisfaction with college programs and services for accountability reporting and self-improvement purposes. Secondly, some researchers have examined student satisfaction to see if satisfaction ratings of college programs and services associate with the satisfaction of the overall college experience. Lastly, few researchers have investigated student satisfaction items related to the occurrence of the educational events such as student retention and attrition. To obtain various satisfaction results, different statistical methods such as descriptive statistics, chi-square, linear regression, multilevel modeling, and ordinal regression techniques have been commonly found in the literature to analyze student satisfaction questionnaires. Descriptive statistics were extensively used to detect the most and the least satisfactory items that students had experienced with their college programs and services. For instance, the mean responses of student satisfaction survey conducted by Noel-Levitz Company revealed community college student satisfaction. The survey respondents rated highest satisfaction on responsiveness to diverse populations, registration effectiveness, and academic services, while rating the lowest satisfaction on admissions and financial aid, academic advising, and campus support services (Cooney, 2000). Using percentages, means, modes, and qualitative written reports, student satisfaction with the quality of integrated academic and career advising was summarized. The study results indicated that most students were "extremely satisfied" or "very satisfied" with their combined academic and career counseling service (Damminger, 2001). An additional example of making use of descriptive statistics was to compare student satisfaction via frequency distribution between two campuses within a university. The study (Wild, 2000) showed that 14 percent of student respondents chose 'very satisfied' ratings in the areas of access to information and student orientation, respectively. Students on both campuses highly rated staff helpfulness, financial aid staff, campus safety and access to computers, while expressing dissatisfaction with off-hours access to registration and the bookstore. Chi-square, linear regression, and multilevel modeling techniques were generally utilized to investigate the association between the explanatory variables and the outcome variable such as student retention and overall satisfaction with academic programs and services. Cross-tabulation and chi-square techniques were used (Bailey, Bauman, and Lata, 1998) to predict college student retention based on satisfaction. A strong relationship between student satisfaction and retention was found on 40 of the 68 questions (59%). Using linear regression and decision tree analysis with the chi-squared automatic interaction detector (CHAID) software program, a study (Thomas and Galamos, 2002) compared student satisfaction responses between academically- and non-academically-oriented student groups. The research results demonstrated that faculty preparedness, social integration, and pre-enrollment opinions emerged as the most important variables contributing to student satisfaction for both groups. Linear regression methods were used to investigate the relationship between student satisfaction and medical school learning environment (Robins, et al, 1997). The study results provided evidence that curriculum structures, (e.g., timely feedback and promotion of critical thinking) were prominent explanatory variables. Using a multilevel modeling technique to analyze survey data, one study (Umbach and Porter, 2001) examined the impact that different departments have on student satisfaction in a large research university. The research finding revealed that characteristics of departments such as size, faculty contact with students, research emphasis, and proportion of female students had a significant impact on education satisfaction within major. By utilizing an ordinal regression model, a newly implemented study (Hummel and Lichtenberg, 2001) was used to estimate the probabilities of the four ordinal categories ("worse", "can't tell", "better", and "much better") of client improvement in a counseling center. The research findings showed that the five explanatory variables significantly associated with the probability of an outcome category. These variables included previous experience as a client; readiness to change; level of symptomatic and interpersonal distress; pre-counseling clinical status; and the number of counseling sessions in which a client might be involved. Based on the literature review, one might conclude that descriptive statistics (e.g., means, percentages, and frequency counts), chi-square (e.g., cross-tabulation, Pearson's chi-square test, decision tree with CHAIDS software program), linear regression, and multilevel modeling approaches were increasingly utilized to study student satisfaction in relation to various explanatory variables. However, compared to these study methods, the ordinal regression method seems to be the most suitable and practical technique to analyze the effects of multiple explanatory variables on the ordinal outcome that cannot be assumed as continuous measure and normal distribution. Researchers do not need to alter an ordinal outcome as binary or dichotomous measure for logistic regression analysis, which may lead to the loss of inherent information. Although the ordinal regression analysis is currently underused in the field of education, several articles were found in the medical field, which illustrated the foundation of the mathematical model and made use of the ordinal regression. In ordinal regression analysis, the two major link functions, e.g., logit and cloglog links, are used to build specific models. There is no clear-cut method to distinguish the preference of using different link functions. However, the logit link is generally suitable for analyzing the ordered categorical data evenly distributed among all categories. The cloglog link may be used to analyze the ordered categorical data when higher categories are more probable (SPSS, Inc., 2002). The ordinal regression model may be written in the form as follows if the logit link is applied. f [gj (X)] = log { gj (X) / [1- gj (X)]}= log {[ P(Y ≤ yj | X)] / [P(Y >yj | X)]}= aj + ßX, j = 1, 2, …, k - 1, and gj (x) = e (a j + ß X) / [ 1 + e (a j + ß X) ], where j indexes the cut-off points for all categories (k) of the outcome variable. If multiple explanatory variables are applied to the ordinal regression model, BX is replaced by the linear combination of ß1X1 + ß2X2 +… + ßpXp (Bender and Benner, 2000). The function f [gj (X)] is called the link function that connects the systematic components (i.e. aj + ßX) of the linear model (Gill, 2001). The alpha aj represents a separate intercept or threshold for each cumulative probability. The threshold (aj) and the regression coefficient (ß) are unknown parameters to be estimated by means of the maximum likelihood method. The name of the logit link can be traced back to the logistic regression function where the odds of event occurrence is defined as a ratio of the probability of event occurrence to the probability of event non-occurrence, e.g., g(X) / [1- g(X)] = e (a + ß X). The log (odds), e.g., log { g(X) / [1- g (X)]} is called the logit, which equals the linear form of a + ßX (Hosmer and Lemeshow, 1989). Notice that the ordinal regression model is called the cumulative logit model because the model is built based on the cumulative response probabilities gj (X) of being in category (j) or lower given the known explanatory variable (Walters, et al. 2001). The ordinal regression model with the logit link is also known as the proportional odds model because the regression coefficient (e.g., log odds) is independent of the category (Bender and Benner, 2000). A part of Table 1 below shows that the cumulative response probabilities were calculated for ordinal regression equations in the logit link. In constructing the ordinal regression model, an alternative choice to the logit link is the cloglog link function. The ordinal regression model may be written in the following form if the cloglog link is used to create the model. f [gj (X)] = log { -log [ 1- gj (X) ] } = log { -log [P(Y =yj | X)/ P(Y >yj | X)] }= aj + ßX, and gj (X)= 1 - e -e (aj + ßX) , where j = 1, 2, …, k - 1 and j indexes the cut-off points for all categories of the outcome variable. Again, if multiple explanatory variables are involved in the ordinal regression model, the linear combination of ß1X1 + ß2X2 +… + ßpXp is substituted for BX (Bender and Benner, 2000). The term of the complementary function comes from [1- gj (X)]. Thus, the name of the complementary log-log link function is derived from log {-log [1- gj (X)]} which equals to the linear form of aj + ßX. The ordinal regression model with the cloglog link is called the continuation ratio model because it is a ratio of the two conditional probabilities, e.g., P(Y =yj | X) to P(Y >yj | X). The model with the cloglog link is also called the proportional hazard model because the relationship between the explanatory variables and the ordinal outcome is independent of the category (Bender and Benner, 2000). The other part of the Table 1 shows that the response probabilities were calculated for the ordinal regression equations in the cloglog link.
The essential features of the ordinal regression model regardless of any link function may be briefly described. First, the outcome variable of interest is a grouped and ordered category that may be regrouped from an unobserved continuous latent variable (Scott, et al., 1997). However, it is not clear whether the ordinal outcome is equally spaced. Second, the ordinal regression analysis employs a link function to describe the effect of the explanatory variables on ordered categorical outcome in such a way that the assumptions of normality and constant variance are not required (McCullagh and Nelder, 1989). Third, the model assumes that the relationship between the explanatory variables and the ordinal outcome is independent of the category because the regression coefficient does not depend on the categories of the outcome variable. In other words, the model assumes that the corresponding regression coefficients in the link function are equal for each cut-off point (Bender and Benner, 2000). Hence, the violation of the model assumption 'parallel lines' has to be verified carefully by the test of parallel lines (SPSS, Inc., 2002). It is interesting to note that the ordinal regression model with the logit link has the property of invariance. If the outcome variable (Y) is coded in the reversed order, the signs of regression coefficients will be changed in the opposite direction (Greenlan, 1994; Walters, et al. 2001). Based on the characteristics of invariance in the logit link, the study results of the ordinal regression analysis would not be affected by the direction of the coding scheme. Being quality conscious and student oriented, administrators and faculty are generally concerned with the quality of programs and services that they offered to students. A graduating student questionnaire has been annually conducted as part of an ongoing evaluation process to solicit student perceptions concerning programs and services. Hence, survey data were collected for all graduating medical students during years 1999-2001. The satisfaction items were parts of the entire questionnaire that allowed graduating students to report their own satisfaction regarding college-learning environment. The questionnaire responses were summarized into the relative frequency distribution and submitted to the school dean for decision-making purposes. With a different and in-depth focus, the ordinal regression analysis was performed to gain insight into how individual items were associated with the overall college satisfaction. The PC-based version 11.0 of the SPSS commands was used to perform the ordinal regression analysis. The graduating student questionnaire was considered to be an observational and cross-sectional rather than an experimental study. This study did not engage in the randomization of assigning treatment or control to students, nor did it involve the manipulation of any treatment or variable to observe the group differences. The questionnaire items consisted of the student satisfaction for the overall college experience e.g., outcome variable), and two demographics such as gender and ethnic groups, and the 42 satisfaction items (e.g., explanatory variables). The 42 explanatory variables were interrelated and classified into the five pre-determined factors--faculty involvement, curriculum contents, support services, facilities, and leisure activities in college. Factor I - faculty involvement included items such as accessibility to faculty, faculty competence, faculty attitude toward students, quality of instruction, student-faculty relations, and instruction/course evaluation. Factor II - curriculum contents incorporated psychological factors in health/illness, cultural factors in disease development, medical ethics, health promotion/disease prevention, HIV/AIDS, clinical skills, communication skills, interpersonal skills, computer skills, and research skills. Factor III - support services referred to admission and registration, financial aid services, library services, tutorial program, board review program, personal counseling, and career counseling. Factor IV - facilities covered classroom facilities, laboratory facilities, housing facilities, and parking. Factor V - leisure activities in college was composed of student recreation, cultural events, and social events. Gender was coded 1 for males and 0 for females while ethnicity was coded 1 for African American and 0 for non-African American. The 42 questionnaire items used a five-point Likert scale: 0 for being "not applicable", 1 for being "very dissatisfied", 2 for being "dissatisfied", 3 for being "satisfied", and 4 for being "very satisfied". The high internal consistency for the survey instrument might be demonstrated based on the alpha reliability--all items combined 0.89 (42 items); faculty involvement 0.87 (10 items); curriculum contents 0.81 (10 items); support services 0.82 (17 items); facilities 0.42 (3 items); and leisure activities 0.75 (2 items). The primary focus of the study was the formulation of the ordinal regression model, the application of ordinal regression analysis, and the interpretation of study results. The student satisfaction questionnaire was analyzed by the ordinal regression method to achieve the four study objectives: (a) to identify significant explanatory variables, i.e., satisfaction items, in the five-item factors that influenced the overall college satisfaction; (b) to estimate thresholds (i.e., constants) and regression coefficients; (c) to describe the direction of the relationship between the explanatory variables and the overall college satisfaction based on the sign (+ and -) of regression coefficients; and (d) to perform classifications for all satisfaction levels of the overall college experience, and subsequently evaluate the accuracy of the classification results. The major decisions involved in constructing the ordinal regression models were deciding what explanatory variables to include in the model equation and choosing link functions that would be the best fit to the data set. Two commonly used link functions, e.g., logit link and cloglog link, were chosen to build the ordinal regression models. If the frequency distribution of the ordered categorical outcome exhibited that the data points were evenly distributed in various categories, then the logit link function might be appropriate. If the frequency distribution of the ordered categorical outcome showed that a large percent of student respondents were in higher categories such as very satisfied and satisfied ratings, then the cloglog link function might be suitable. In fact, there was no clear-cut choice of link functions. If one link function did not provide a good fit to the data, then the other link function might be a viable alternative. As a result, it was worth trying the alternative link function to see if the model turned out to be the better one. In addition, the model assumption of parallel lines across the corresponding response categories in the link functions was carefully examined to determine the model adequacy. Because the link functions were used to form the ordinal regression models under a strong assumption of parallel lines, any departures from this assumption might result in the incorrect analysis and conclusion (McCullagh, 1980). Furthermore, the contingency table showing the accuracy of the classification for the ordered categorical outcome was evaluated to determine which link function was superior. In order to interpret the ordinal regression model, researchers would first look at the signs of the regression coefficients. These signs give a great deal of insight into the effects of the explanatory variables on the ordinal outcome. The positive regression coefficient indicated that there was a positive relationship between the explanatory variable and the ordinal outcome. For the opposite direction, the negative regression coefficient indicated that there was a negative relationship between the explanatory variable and ordinal outcome. If the logit link (or cloglog link) was a choice of the modeling equation, the magnitude (e.g., odds or eß) of the effect of a specific explanatory variable would be used to indicate that an average of one unit change on a specific explanatory variable affects on the change of the odds (or relative risk) of the event occurrence by a factor of eß, holding other explanatory variables as constant. Researchers need to be aware of the potential limitations in the study. Although the graduating student questionnaire data have been gathered during a three-year period for a small medical college, the sample size was still too small to yield the high power of the statistical test given that many explanatory variables entered the equation for analysis. Additionally, the item responses coded as zeroes for being "not applicable" were treated as missing values and excluded from the study. The large percent of cells with missing data could lead to an inaccurate chi-square test for the model fitting. Note that the model goodness-of-fit is usually dependent of chi square test results. However, if number of cells with zero value is large, the chi-squared goodness of fit statistics may not be appropriate (Agresti, 1990). Therefore, researchers are limited in how well they can assess the overall explanatory power of the models. Finally, the logit link and cloglog link in the ordinal regression analysis were not capable of selecting a subset of significant explanatory variables by automatic model building methods such as stepwise and back elimination procedures in SPSS command language. Therefore, the selection of explanatory variables in the model depended on the intuition from researchers and a trial and error approach described in the following two paragraphs. The model construction generally involves the use of the completed and the reduced models along with various link functions to create a pool of the candidate models. By examining one candidate model at a time, researchers should use the test of parallel lines as the fundamental step to assess the validity of the model assumption. Certain candidate models in a pool needed to be discarded if they failed to provide the evidence of satisfying the model assumption. Additionally, the model fitting statistics, e.g., pseudo R squares, and the accuracy of classification results should be used as criteria to screen the candidate models and choose the appropriate ones. When these sound appropriate models were chosen, researchers could temporarily eliminate a few observations or insignificant explanatory variables (say, one or two) on the questionnaire data to investigate if the modified models maintained their stability (e.g., model parameters slightly changed after the temporary elimination). If the modified models exhibited instability, they needed to be discarded immediately. Finally, the principle of parsimony should apply to the model construction. Webster's dictionary defines parsimony as stinginess, meaning that if fewer explanatory variables are sufficient to explain the effects of the explanatory variables, the regression model does not need to include unnecessary variables. Based on the principle of parsimony, the reduced models that met the above screening criteria should be considered as the ideal models. However, without the automatic model building methods in SPSS package, the selection of "few" "important" explanatory variables to form the reduced models remain a challenging task for researchers. For instance, how did researchers decide which explanatory variables were important? The questionnaire items rated by the large percentage of student respondents expressing satisfaction (e.g., the most satisfactory--90% or more) and dissatisfaction (e.g., the least satisfactory--30% or more) might be fundamentally considered as "important" explanatory variables. Another question to be asked was 'How many important explanatory variables were needed in the reduced models?' This is a case of not knowing how many underlying variables there are for the given data. Because a minimum ratio (e.g., 1 to 10) of the number of the explanatory variables to the sample size is recommended by a logistic regression study (Peng et al., 2002), the number of explanatory variables could be determined by dividing 10 into the number of the questionnaires completed. Seven steps for the SPSS PC version 11.0 commands were required to produce the ordinal regression model: Step 1 - Click Analyze, click Regression, and click Ordinal; Step 2 - Click over exp (dependent variable), and click <right arrow> sign to move it to the dependent box; Step 3 - Hold down the CTRL key, click all independent variables, and click <right arrow> sign to move them to the covariates box; Step 4 - Click <down arrow> sign to display the ordinal regression - options and select Logit Link or Complementary Log-Log Link, then click continue; Step 5 - Click Output button; select display--goodness of fit statistics, summary statistics, parameter estimates, test of parallel lines; Step 6 - Click Save variables--estimated response probability, predicted category, and click Continue; and Step 7 - Click OK. From 1999 to 2001, a total of 179 graduating medical students completed and returned the questionnaires with a response rate of 83% (179/216). The relative frequency distribution of all student satisfaction ratings was prepared. The student respondents were satisfied (50%) and very satisfied (45%) with the overall college experience. The majority of student respondents seemed to be satisfied with the college programs and services, regardless of gender and ethnic groups. The student respondents were most satisfied (i.e., top 10 item ratings in terms of the total percent of student respondents reported 'satisfied' and 'very satisfied') with accessibility to faculty, faculty competence, quality of instruction, student-faculty relations, health promotion/disease prevention, HIV/AIDS, medical ethics, clinical skills, communication skills, and the bookstore. On the contrary, the student respondents were least satisfied (i.e., bottom 10 item ratings in terms of total percent of student respondents reported 'satisfied' and 'very satisfied') with career counseling, personal counseling, student recreation, computer-assisted instruction, computer skills, research methodology, video services, mail room services, housing facilities, and parking. The complete model analyzed 148 of the 179 questionnaires and excluded 31 questionnaires from the study as a result of having at least one item with missing data or 'not applicable' rating. The study results for the complete model containing all satisfaction items revealed a number of interesting findings. Within the complete models, the cloglog link was the better choice because of its satisfying 'parallel lines' assumption and larger model-fitting statistics, which will be discussed later. Using the complete model with the cloglog link, Table 2 shows that the two thresholds of the model equation were significantly different from zero and substantially contributed to the values of the response probability in different categories. In addition, the satisfaction of the overall college experience was significantly associated with the five explanatory variables (e.g., accessibility to the dean; accessibility to faculty; faculty attitude toward students; student-faculty relations; and HIV/AIDS). These five significant explanatory variables exhibited positive regression coefficients, indicating that students who rated higher levels of satisfaction on these explanatory variables were likely to rate a higher satisfaction for the overall college experience. Of these five satisfaction items on the satisfaction of the overall college experience, 60 percent or three satisfaction items were related to faculty involvement--accessibility to faculty, faculty attitude toward students, student-faculty relations. Furthermore, none of the satisfaction items regarding facilities, support services, and leisure activities in college was significantly associated with the satisfaction of the overall college experience.
Using the complete model with the logit link to build the ordinal regression model, the satisfaction of the overall college experience was found to be significantly associated with the six explanatory variables: ethnicity, accessibility to the dean, accessibility to faculty, student-faculty relations, health promotion/disease prevention, and HIV/AIDS. However, because the complete model with the logit link failed to provide the evidence of satisfying 'parallel lines' assumption (i.e., convergence could not be attained according to the SPSS printout), the research findings mentioned above should be discarded. Therefore, it is unnecessary to prepare a table that contains item name, regression coefficient, and p value in this paper. The model-fitting statistic, namely the pseudo R square, measured the success of the model in explaining the variations in the data. The pseudo R square was calculated depending upon the likelihood ratio. For example, the McFadden's R square compared the likelihood for the intercept only model to the likelihood for the model with the explanatory variables in order to assess the model goodness of fit. The interpretation of pseudo R square in the ordinal regression model was similar to that of the R square (e.g., Coefficient of the Determination) in the linear regression model. The pseudo R square indicated that the proportion of variations in the outcome variable was accounted for by the explanatory variables. The larger the pseudo R square was, the better the model fitting was. The pseudo R squares for McFadden (.56), Cox and Snell (.60), and Nagelkerke (.75) in the complete model with the cloglog link were larger than those for McFadden (.49), Cox and Snell (.55), and Nagelkerke (.68) in the complete model with the logit link. The additional model fitting statistic, the Pearson's chi-square, (X2 = 228.57 with d.f. of 242 and p = .723) for the complete model with the cloglog link indicated that the observed data were consistent with the estimated values in the fitted model. However, the Pearson's chi-square test statistic X2 = 282.46 with d.f. of 242 and p = .038 for the complete model with the logit link indicated that the observed data were not consistent with the estimated values in the fitted model. Hence, the complete model with the cloglog link was a better model as compared to the complete model with the logit link based upon the chi-square test results. The test of parallel lines was designed to make judgment concerning the model adequacy. The null hypothesis stated that the corresponding regression coefficients were equal across all levels of the outcome variable. The alternative hypothesis stated that the corresponding regression coefficients were different across all levels of the outcome variable. The chi-square test result (X2 = 60.75 with d.f. of 44, and p = .08) indicated that there was no significant difference for the corresponding regression coefficients across the response categories, suggesting that the model assumption of parallel lines was not violated in the complete model with the cloglog link. However, as previously mentioned, the complete model with the logit link failed to provide the evidence of satisfying the assumption of parallel lines. The cross-tabulating method was used to categorize the classified and the actual responses into a 3 by 3 classification table. Table 3 displays the accuracy of the classification results for the satisfaction response categories. The complete model with the cloglog link classified the categories of "very satisfied" (86%), "satisfied" (82%), and "dissatisfied" (40%). The model demonstrated high prediction accuracy (82%) for all three categories combined. The classification results of the complete model with the logit link did not need to be presented in this paper because it was unable to perform the evidence of satisfying the test of the parallel lines. Also, the result of the chi-square test for the model fitting of the complete model with the logit link failed to indicate that the observed data were consistent with the estimated values in the fitted model.
Similar to linear and logistic regression modeling techniques, the principle of parsimony was applicable to the construction of the ordinal regression model. The argument is that if the complete models containing all explanatory variables were too complex, it could result in inaccurate estimation of the parameters and instability of the model structure. Based on the above modeling strategy, the reduced models with the logit and cloglog links were constructed to include only the 20 explanatory variables--the top and the bottom 10-item ratings for the total percent of student respondents reported 'satisfied' and 'very satisfied'. The reduced model analyzed 155 of the 179 questionnaires and excluded 24 questionnaires from the study as a result of having at least one item with missing data or 'not applicable' rating. Table 4 shows that the result of the reduced model with the logit link, indicating the satisfaction of overall college experience was significantly affected by the satisfaction ratings of the three explanatory variables--health promotion/disease prevention, faculty competence, and student-faculty relations.
The results of the reduced model (e.g., item name, regression coefficient, and p value) in the cloglog link did not need to be presented in the paper because the model assumption of parallel lines was violated. The model assumption of parallel lines in the reduced model with the logit link was not violated (e.g., X2 = 25.567 with d.f. of 20 and p = .18). In addition, the result of the Pearson's chi-square test (X2 = 208.25 with d.f. of 276 and p = .999) indicated in the reduced model with the logit link that the observed data were consistent with the estimated values in the fitted model. Hence, the reduced model in logit link was a good model. The three pseudo R squares--McFadden (.37), Cox and Snell (.45), and Nagelkerke (.57)--were high for the reduced model in logit link. Furthermore, the accuracy of the classification results for the satisfaction response categories was shown in Table 5. The reduced model with the logit link classified the categories of "very satisfied" (78%), "satisfied" (74%), and "dissatisfied" (40%). The model demonstrated fairly high prediction accuracy (75%) for all three categories combined. If the principle of parsimony was considered to be the most important modeling strategy, then the reduced model with the logit link might be a better model when compared to the complete model with the cloglog link. The reduced model with the logit link appeared to be the best model in this study based on the model fitting statistics, the accuracy of classification results, and the principle of parsimony.
Numerous research findings were worthwhile to reiterate in this study. The reduced model with the logit link became the best model based on the screening criteria-- the validity of model assumption, the fitting statistics (e.g., Person's chi-square and pseudo R squares), the accuracy of the classification results, the principle of parsimony, and the stability of parameter estimation. Therefore, needless to say, major research findings and implications should be drawn from the best model. The two explanatory variables related to the satisfaction of faculty involvement (i.e., faculty competence and student-faculty relations) were identified in the best model. Student satisfaction with faculty involvement significantly contributes to the probability of students expressing satisfaction with the overall college experience. It is expected that a small medical college with a low student-faculty ratio could lead to higher student satisfaction rating regarding faculty involvement. However, it provided the compelling evidence that faculty members have played a significant role in creating a pleasant environment influenced on student satisfaction for the overall college experience. In addition, the curriculum content regarding health promotion and disease prevention was significantly associated with the satisfaction of the overall college experience. It may provide evidence that one component of the medical curriculum has addressed the needs of medical students and contributed to the fulfillment of medical college goal, e.g., delivery of primary care through health promotion and disease prevention. The study suggested that the vast majority of student respondents expressed their satisfaction with faculty (e.g., faculty competence, student-faculty relations) and curriculum content (e.g., health promotion/disease prevention). The research findings in this study seemed to be identical to the previous study reported by the University of Michigan (Robins, et al, 1997), where students strongly valued their learning environment especially with faculty. Overall, this study should be viewed as an important first step for the medical college to explore the relationship between the overall college satisfaction and multiple explanatory variables concerning faculty involvement, curriculum contents, support services, facilities, and leisure activities in college. The knowledge gained from this study would be beneficial to the medical college and its students. The goal was to obtain information from students to establish benchmarks that could be helpful to decision makers in medical college for improving medical education. For example, the medical college could pursue its ultimate goal of ensuring student satisfaction with the overall college experience by enhancing faculty involvement and curriculum contents. Medical students could ensure themselves participate in the quality of programs supported by the capable faculty and the adequate curriculum contents. In this study, the principle of parsimony along with various link functions was adopted to build the candidate models and to search for the best model. Much of the time and energy was devoted to developing candidate models, checking the model assumptions, assuring the model goodness of fit, and consequently selecting the best model for the medical college. The model building itself might be partly statistical methodology and partly experience and common sense of the researchers. The ordinal regression method provides a viable alternative to analyze student satisfaction data with the ordered categorical outcome. It does not treat an ordinal outcome as binary or dichotomous measure like logistic regression analysis, which may lead to the loss of information inherent. Also, it is not falsely assumed continuous measure and the properties of normality and constant variance for linear regression to analyze few categories of ordinal outcome, which may lead to incorrect analysis. Clearly, the ordinal regression modeling is a unique statistical technique in that the ordinal outcome variable is frequently encountered in the field of educational research and the model assumption of parallel lines is easily assumed and verified. This modeling technique is actually a practical tool that should be added to a practicing researcher's toolkit. It is convenient for some researchers to analyze ordinal outcome by means of logistic and linear regression analyses. By altering the measuring scale of ordinal outcome, researchers are able to analyze data and produce research findings. However, the loss of information or incorrect analysis may have occurred in some cases. For instance, when the scale of outcome categories (e.g., very satisfied, satisfied, dissatisfied, and very dissatisfied) is arbitrarily collapsed into a binary measure (e.g., satisfied and dissatisfied), researchers are forced to use logistic regression analysis to analyze the two levels of ordinal outcome. By doing so, important information may be lost in the resulting model. Also, while few categories of ordinal outcome are treated as continuous measure, linear regression method is used to analyze the ordinal outcome that cannot be plausibly assumed normality and constant variance. Using linear regression method to analyze the ordinal outcome, researchers may produce incorrect estimation and interpretation based on the violation of model assumption. Therefore, if researchers wish to study the effects of explanatory variables on all levels of the ordered categorical outcome, an ordinal regression method must be appropriately chosen in order to obtain the valid research results. In this study, the ordinal regression method was used to model the relationship between the ordinal outcome variable, e.g., different levels of student satisfaction regarding the overall college experience, and the explanatory variables concerning demographics and student learning environment. The outcome variable for student satisfaction was measured on an ordered, categorical, and four-point Likert scale--'very dissatisfied', 'dissatisfied', 'satisfied', and 'very satisfied'. Explanatory variables included two demographics, e.g., gender and ethnic groups, and 42 questionnaire items related to the satisfaction of faculty involvement, curriculum contents, support services, facilities, and leisure activities at the college. The research findings indicated faculty competence, student-faculty relations, and curriculum content regarding health promotion and disease prevention were significantly associated with the satisfaction of the overall college experience. Using the ordinal regression method, researchers could identify the significant explanatory variables with their control to enhance student satisfactions regarding college-learning environment. Essentially, the four sequential protocols are performed to create an ordinal regression model. First, the explanatory variables are examined to determine if they should be included in the model. Second, the outcome variable is coded in ordered, ranked, and categorical fashion. The explanatory variables are quantified by continuous and discrete measures. Third, the complete and the reduced models as well as the logit link and the complementary log-log (cloglog) link are used to produce the candidate models. The complete model contains all the explanatory variables in the model while the reduced model includes only a subset of the predetermined explanatory variables. Finally, the best model is chosen among all candidate models depending upon the model fitting statistics, the accuracy of the classification results, and the validity of the model assumption. The strengths of the ordinal regression model in this study are briefly described. First, many indicators concerning student learning outcome are frequently measured on an ordinal scale. For instance, course performances on a letter grade scale, (e.g., A, B, C, and D) and satisfaction levels perceived by students on a Likert scale, (e.g., very satisfied, satisfied, dissatisfied, and very dissatisfied) are most appropriately measured by an ordinal scale. Thus, the ordinal regression model seems to have a broad marketplace to analyze diverse student-learning outcomes. Second, comparable to linear and logistic regression models, ordinal regression model can be used to perform the following tasks: (1) to identify significant explanatory variables that influence on the ordinal outcome; (2) to describe the direction of the relationship between the ordinal outcome and the explanatory variables; and (3) to perform classifications for all levels of the ordinal outcome, and subsequently evaluate the predict validity of the regression model. Third, various link functions such as logit and cloglog links are readily available to model the effect of the explanatory variables on the ordinal outcome. Fourth, the test of parallel lines can be easily used to assess the validity of the model assumption, and the model fitting statistics (e.g., -2log likelihood ratio and pseudo R squares) can be used as criteria to screen the candidate models and choose the most appropriate one. Finally, the model assumes that the relationship between the ordinal outcome and the explanatory variables is independent of the category. This assumption implies that the corresponding regression coefficients in the link function are equal for each cut-off point (Bender and Benner, 2000). Therefore, it is easy to construct and interpret the ordinal regression model, which requires only one model assumption, and produces only one set of regression coefficients. Indeed, the ordinal regression technique provides a viable alternative to analyze the ordinal outcome. It does not alter an ordinal outcome as binary or dichotomous measure for logistic regression analysis, which may lead to the loss of information inherent. Also, it does not falsely assume continuous measure and the properties of normality and constant variance for linear regression to analyze few categories of ordinal outcome, which may lead to incorrect analysis. Obviously, the ordinal regression modeling is a unique statistical technique in that the ordinal outcome is frequently encountered in the field of education and the model assumption of parallel lines is easily verified. Researchers need to be aware of the limitations in using ordinal regression model. For instance, the "not applicable" responses of the satisfaction items (e.g., explanatory variables) are treated as missing values and excluded from the study. The large percent of cells with missing data could lead to a decrease of actual sample size for the model construction or an inaccurate chi-square test for the model fitting. Note that the model goodness-of-fit is usually dependent of chi-square test result. The chi-square test result normally depends on the sample size. Hence, if number of cells with a zero value is large, the chi-squared goodness of fit statistics may not be appropriate (Agresti, 1990). Thus, researchers are limited in how well they can assess the model goodness of fit. In addition, the logit link and cloglog link in the ordinal regression analysis are not capable of selecting a subset of significant explanatory variables by means of automatic model building methods such as stepwise and back elimination procedures in SPSS command language. Therefore, researchers are obliged to rely on their own intuition and experiences to select a subset of the important or significant explanatory variables in the model. As a result, much of the time and energy is devoted to developing candidate models, checking the model assumptions, and assuring the model goodness of fit. The ordinal regression model is strictly built based on the model assumption of parallel lines (e.g., equal regression coefficients) for all corresponding outcome categories. If the verification of model assumption fails, the multinomial logistic regression model that does not require the model assumption should become an alternative tool. The multinomial logistic regression model is an extension of binary logistic regression in that automatic model building methods are built in SPSS PC version 12.0 commands. In multinomial logistic regression, the outcome variable is categorized as the nominal groups--the target groups and the reference group. For example, very dissatisfied rating is labeled as target group 1; dissatisfied rating is coded as target group 2; satisfied rating is considered as target group 3; and very satisfied rating is treated as the reference group. Three model equations are generated for the nominal outcome with the four categories. The three sets of relative risk are calculated when the probability of individual students falling into specific target category (j) is compared to those individuals being the reference category (k), e.g., P (Y=yj) / P (Y=yk) (Plank and Jordan, 1997). The magnitude of the effect of a specific explanatory variable can be expressed as an average of one unit change on an explanatory variable affects on the change of the relative risk of individual students falling back the target category rather than advancing to the reference category. This article is an excellent reminder that there is life beyond linear and even logistic regression. It walks the researcher through some of the key decision points that are faced but because of the complexity of the topic, it should be seen as an introduction to the topic with additional work needed to use ordinal regression with comfort. The following are some notes.
Bailey, Brenda L.; Bauman, Curtis; Lata, Kimberly A. (1998). Student Retention and Satisfaction: The Evolution of a Predictive Model. Paper presented at the Annual Forum of the Association for Institutional Research, Minneapolis, Minnesota. Eric No: ED424797 Bender, R. and Benner (2000). A. Calculating Ordinal Regression Models in SAS and S-Plus. Biometrical Journal 42, 6, 677-699. Cooney, Frank (2000). A Review of the Results and Methodology in the 1999 Noel Levitz Student Satisfaction Survey at Salt Lake Community College. Salt Lake City, Utah: Salt Lake Community College. Eric No: ED443482 Damminger, Joanne K. (2001). Student Satisfaction with Quality of Academic Advising Offered by Integrated Department of Academic Advising and Career Life Planning. Glassboro, New Jersey: Rowan University. Eric No: ED453769 Gill, Jeff (2000). Generalized Linear Model: A Unified Approach. Sage Publication, Thousand Oaks, California. Greenlan, S. (1994). Alternative Models for Ordinal Logistic Regression. Statistics in Medicine, 13, 1665-1677 Hosmer, David W. and Lemeshow, Stanley (1989). Applied Logistic Regression. John Wiley & Sons, New York Hummel, T.J. and Lichtenberg, J.W. (2001). Predicting Categories of Improvement Among Counseling Center Clients. Paper presented at the annual meeting of the American Educational Research Association, Seattle, W.A. McCullagh, P. (1980). Regression Models for Ordinal Data (with Discussion), Journal of the Royal Statistical Society - B 42, 109 - 142. McCullagh, P. and Nelder (1989). J. A. Generalized Linear Models. Chapman and Hall, New York Peng, Chao-Ying Joanne; Lee, Kuk Lida; Ingersoll, Gary M. (2002). An Introduction to Logistic Regression Analysis and Reporting, Journal of Educational Research, Sept-Oct 2002 v96 il p3(13) Plank, Stephen B. and Jordan, Will J. (1997). Reducing Talent Loss. The Impact of Information, Guidance, and Actions on Postsecondary Enrollment, Report No. 9 Eric No: ED405429 Robins, Lynne S.; Gruppen, Larry D.; Alexander, Gwen L.; Fantone, Joseph C.; and Davis, Wayne K. (1997). A Prediction Model of Student Satisfaction with the Medical School Learning Environment. Academic Medicine, Vol. 72. No. 2. Scott, Susan C., Goldberg, Mark S., and Mayo, Nancy E. (1997). Statistical Assessment of Ordinal Outcomes in Comparative Studies. Clinical Epidemiology Vol. 50, No. 1, pp 45-55 SPSS, Inc. (2002), Ordinal Regression Analysis, SPSS Advanced Models 10.0., Chicago, IL. Thomas, Emily H.; Galambos, Nora (2002). What Satisfies Students? Mining Student-Opinion Data with Regression and Decision-Tree Analysis. Stony Brook, New York: Stony Brook University. Umbach, Paul D.; Porter, Stephen R. (2001). How Do Academic Departments Impact Student Satisfaction? Understanding the Contextual Effects of Departments. Paper presented at the Annual Meeting of the Association for Institutional Research, Long Beach, California. Eric No: ED456789 vWalters, S.J., Campbell, M.J., and Lall, R (2001). Design and Analysis of Trials with Quality of Life as an Outcome: A Practical Guide. Journal of Biopharmaceutical Statistics 11(3), 155-176.Wild, Nancy (2000). Rogue Community College Student Satisfaction Survey, Management Report: Redwood and Riverside Campuses. Grant Pass, Oregon: Rogue Community College. Eric No: ED448831 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||