The Quality of National Data: IR Professionals’ Ratings

Can you trust national data? Are the data available to institutional researchers reliable?  

AIR asked a random sample of members to rate the quality of 11 large-scale data sets often used in IR studies. Quality was defined as the degree to which data are complete, robust, and accurate. Opinions were shared by 151 AIR members (a response rate of 31%).  

Nearly all survey respondents know IPEDS data well enough to have opinions about quality—only 7% did not offer ratings (that is, they had no opinion or did not know about quality). IPEDS stands out as a star among the data sets reviewed in that it is well-known and its quality is rated as very high or high by 66% of respondents.  

Respondents also know and have opinions about U.S. News sources—only 25% could not rate U.S. News Best Colleges data, and 34% could not rate U.S. News World’s Best Universities data. However, these data received the lowest marks of all sources reviewed. Only 8% of respondents offered very high or high ratings for Best Colleges, and even fewer respondents (5%) indicated positive ratings for World’s Best Universities. 

Less is known about the quality of other sources included in the survey. More than 60% of respondents did not rate five data sets: the National Community College Benchmark Project, the Consortium for Student Retention Data Exchange (CSRDE), the Survey of Doctorate Recipients, the National Survey of Recent College Graduates, and the National Study of Instructional Costs and Productivity (Delaware Study). An additional source, the College and University Professional Association for Human Resources (CUPA-HR), was not rated by 56% of respondents.  

However, when “don’t know” and “no opinion” responses were removed, several of the lesser-known data sources were rated as very high or high by those who offered opinions. CSRDE, the Survey of Doctorate Recipients, and CUPA-HR were especially strong performers with very high or high ratings of 71%, 71%, and 68% respectively.  Maybe more institutional researchers should become familiar with these highly regarded data. 

These results show a wide variance in perceptions of quality of these 11 data sources included in the mini-survey. Respondents’ comments indicated three main concerns regarding data quality: questions about methodologies, lack of consistency of definitions, and the possibility of inaccurate results due to matching errors. Many respondents noted that they do not expect any data set to be “perfect,” and one individual suggested that “the question should be: What level of variation do we accept as good enough to do IR work?”  

Join the conversation and share your thoughts and questions about the quality of large-scale data sets below, and view the full survey results.




To add a comment, Sign In
Total Comments: 3
Marlene posted on 5/8/2013 10:22 AM
This is an important topic as it seems that so many different organizations have some means by which they are trying to assess what higher education institutions are doing. I was just sent a link yesterday to Pay Scale's ranking of institutions for ROI ( While they are clear about their methodology, questions remain...such as, how many responses form the basis for my institutional ranking?
Gerry posted on 5/9/2013 12:30 PM
This is interesting and potentially important, since it comes from those primarily charged with SUPPLYING those data to those surveys. Higher education researchers, often without IR backgrounds, should be aware of the concerns for data quality before they use some of these surveys for applied/basic research . . . garbage in, garbage out!
Eric posted on 5/10/2013 12:39 PM
I appreciate that we have a growing number of national data sets, but I would add that my concerns have always been about consistency of measurement. For example, decision makers like to point to retention rates, however there are various ways to measure retention. Even if there were a single, universally recognized measurement, like Gerry posted we would still need to be aware of validity/reliability issues for the data providers. You're right -- garbage in, garbage out.