Dealing With Questionable Historic Data

Ask eAIR invites questions from AIR members about the work of institutional research, careers in the field, and other broad topics that resonate with a large cross-section of readers. Questions may be submitted to eAIR@airweb.org

This month’s question is answered by Gary Lowe, Principal Analyst, Institutional Planning & Analysis, University of California-Merced.     

The ideas, opinions, and perspectives expressed are those of the author, and not necessarily AIR. Members are invited to join the discussion by commenting at the end of the article.

GaryLowe.JPGDear Gary: How do you deal with questionable/bad historic data? Do you go back and investigate/change/update 10+ years’ worth of reports or just cut your losses and move forward with a better system?

Data cleaning is one of those areas that often creates debate within institutional research offices. Some people view census or snapshot data as sacred while others believe you should make revisions when incorrect data is discovered. There is no correct answer, so you should handle circumstances on a case-by-case basis. If your campus has a data governance group, this is an excellent topic for them to discuss since data cleaning might involve more than the IR office.

My approach is to first consider whether the incorrect information has been reported to IPEDS. You can run into instances where grant proposals want data to match IPEDS information. Unless someone can articulate a very strong reason for making changes, once information is publicly available and considered as “official,” I recommend leaving the data as is. 

If the information is not reported officially but can significantly affect any analysis done using the data, there are a few items to consider. First, you need to evaluate the extent of the data quality problem. Was it a one-time issue? If so, consider excluding that semester/quarter from the analysis. However, if there was a data quality problem occurring over a longer period of time, consider bringing the issue to the data governance committee.  

Before bringing the problem to their attention, do some research so you can provide the committee with an idea of the extent of the problem.

  1. Where did the data quality issue occur? If the problem was in the source system, determine the ease of changing the data. Many will view changing old data in source systems as an inefficient use of limited resources.

  2. If changing the source system is not practical, can a derived variable be easily created that “corrects” the data quality issue and allows research projects or reports to reflect the data more accurately?

  3. What is a rough estimate on how much time will need to be invested in data cleaning and how the bad data impacts analysis?

Once these questions have been answered, you can evaluate the circumstances and decide how to proceed. If you or the data governance committee do elect to make changes from the official data sources, be sure you document and footnote your research or reports so that everyone understands where changes were made and the logic behind making those changes. Keep in mind that being able to replicate findings is an important component of research. Other people may want to follow up on your research findings, or you may be asked to update the information in a few years.

 

 Comments

 
To add a comment, Sign In
There are no comments.