Using Large Datasets – Opinion Essay

By Patricia Windham, Ph.D.

NOTE: eAIR Special Features foster broad knowledge and appreciation of the diverse membership of AIR, and of the different professional contexts and activities in which members are engaged. This opinion piece was peer-reviewed by members of the AIR publications volunteer panel, and the ideas, opinions, and perspectives expressed are those of the author, and not necessarily AIR.  Members are invited to join the discussion by commenting below.

PW2.jpgWe are living in the era of large datasets. Billions of pieces of information are being collected daily and stored on servers. These servers know more about us than we know about ourselves. We have exchanged our privacy for all types of offers and coupons. We seem to have decided that collecting data just because we can is a wonderful idea. The collect genie is out of the bottle, but what happens now? Can we use all these data to help us better understand the reality of our world? The major question now is how will the data be transformed into information, and in turn, how will the information be used?

In education, we have a multitude of records on each student at each major point in their educational careers. Data are collected and stored from PreK–12 through graduate school—but what are we doing with them? Currently, we are filling out compliance reports for national, state, and local agencies. We are producing transcripts and investigating basic trends. We are completing teacher evaluations and sharing some data with stakeholders. We track students into the workforce as a way of demonstrating how well we prepare students for the workplace. But are we really using all of these bits and bytes to produce better outcomes?

I am concerned that we are still in the throes of data collection and a “see what all I have” mode rather than in data use mode. Clearly, time and effort must be spent on data collection and cleaning so that results will reflect what is actually happening in our institutions. But we must not stop there.

We must begin to ask ourselves what do we want to know. How can we move beyond the compliance reports toward results that help our students? We have seen the rankings, including results of the Trends in International Mathematics and Science Study (TIMSS) that indicate the United States is behind many of the world’s industrialized nations in terms of what our students know and can do. Can we use these large datasets to improve our day-to-day instruction?

We are constantly told by businesses that even college graduates do not have the necessary skills for entry-level positions, much less those needed to move up in a company. Can we use the information contained in massive datasets to correct this?

A starting point would be enabling these datasets to talk to each other. For example, Florida used to have three large datasets—one each for K-12, community colleges, and universities. The datasets contained information at the individual student level, but were controlled by different entities; the information in one dataset was often not available to persons working with another dataset. Until an education data warehouse was established that brought the three sets together, tracking students across systems was almost impossible. Thus, patterns that inform our understanding of student success went undetected, especially as students moved from one system to another. Once the systems were aligned and integrated, it was possible to track students from high school into community colleges and on into universities. These studies enabled the State of Florida to learn more about who is successful and identify measures that indicated when students may need additional assistance.

As with most things, the large datasets we have available can be used to continue doing what we are doing, or they can be used to help us move our educational system forward. In order to do the latter, we must begin now to make conscious decisions about how we collect and use data. We have the means to discover many elements of what needs to be done to improve our educational system. The question is do we have the will to make the hard decisions and follow-up on the results of our research?

Patricia Windham, Ph.D., is the former Associate Vice-Chancellor for Research and Evaluation at the Florida Division of Community Colleges.



Michelle posted on 7/10/2013 10:10 PM
Excellent ideas to ponder. Collecting massive quantities of data involves both technical expertise and ethics, and the phrase "just because we can, doesn't mean we should" comes to mind. Thanks for asking the profession to examine these questions and make decisions that will impact higher education for the foreseeable future.
Terry posted on 7/11/2013 8:07 AM
I have to agree that we are still in the "Let's see what we have" phase. As institutional researchers we need to be at the forefront in design and interpretation of what we have. IR needs to step and be the group to say, "This is what we see and here is where we should focus". Isn't that really the object of collecting those big datasets, to drill down and see where we can have the most impact?
Emily posted on 7/11/2013 8:38 AM
The ability for these large datasets to "talk to each other" is definitely a needed next step. In order for us to investigate the entire pipeline and even employment outcomes, we need to be able to track students through the various systems.
Heather posted on 7/11/2013 9:17 AM
I agree that we need to move our educational system forward AND the way we do business. I love Richard Branson's phrase: Screw Business as Usual, and we need to apply this to education too. Innovation, creativity coupled with removing current limiting beliefs will drive us forward: It does not matter how much data you have, what matters is how you use it!
Marlene posted on 7/11/2013 9:58 AM
Very timely piece. We are having similar conversations about the use of data at my institution--for instance, we ask our students to complete this survey and that survey, but are we making the most of all of that data? Can we learn more by connecting these multiple sources of data?
Gina posted on 7/11/2013 11:15 AM
Thoughtful collection and use of data are definitely crucial ideas for IR professionals to ponder. At my former and current institutions we strongly encourage review of available data prior to discussion about survey development as often the data that will answer someone's question is already being collected, just maybe not by them. IR can serve as the guide to data sources that will help answer important questions on campus.
Eric posted on 7/15/2013 12:54 PM
Patricia, thank you for sharing this! This brings to mind how valuable it is for any institution, regardless if it is in a state system or not, to use large data such as the NSC Student Tracker to see where students attend after they depart.
Jose posted on 8/1/2013 12:06 PM
