Big Data, Small Data

​Ask eAIR invites questions from AIR members about the work of institutional research, careers in the field, and other broad topics that resonate with a large cross-section of readers. Questions may be submitted to

This month’s question is answered by Leslie Wasson, Director of Institutional Research, Assessment & Planning, Phillips Graduate University.

The ideas, opinions, and perspectives expressed are those of the author, and not necessarily AIR. Members are invited to join the discussion by commenting at the end of the article. 

LWasson.jpgDear Leslie: My president and board members have been hearing about big data in the popular media, and they want to know what we are doing about it. How do I explain to them the differences between big data and what we have, which is small data? They want to be assured they can use our information to make good decisions.

You are very fortunate to have a board and president who are interested in new ideas and additional knowledge. You want to encourage such questions while supporting good decision-making based on your local context.

Although there is not a firm division at this time, big data is often measured in petabytes and exabytes. It tends to be unstructured and located across numerous sources. The size and complexity of the data available to larger organizations exceeds the capacities for storage, retrieval, and analysis of data in a reasonable time frame possessed by more traditional systems, such as data warehouses or relational databases.

A data warehouse can be defined as “a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.” Unlike the live databases of student, faculty, and staff interaction, a warehouse often will be a series of historical snapshots that allow comparisons over time and trend identification.

Most of us in higher education will not have the kind of data volume and variety that lends itself to big data solutions. We have small data, or in the case of a large state system, medium data. Most of us will have the dedicated servers or cloud storage to maintain our historic data in an integrated data warehouse setting. Our queries will be fairly straightforward and our models less complicated than the algorithms used to predict consumer behavior and identify potential new markets. If that changes, we can always consider the strategic advantages and costs of a big data/Business Intelligence (BI) solution.

College and university data are related to each other. Accessibility tools might include smaller data marts or reporting cubes for particular areas, such as financial aid or alumni contacts. These lend themselves nicely to self-service query and reporting tools, giving users the option to be in touch with the institutional data and its patterns over time.

While the pace of change in higher education has been increasing, it is still slow compared to the corporate world. Big data solutions are popular right now, and by the time they may be a feasible implementation for educational organizations, many of the bugs will have been worked out.

In sum, big data and BI have broad applications when you have too much data to manage, but are big guns if you don’t. In the words of the experts: “Big data can be contrasted with small data, another evolving term that's often used to describe data whose volume and format can be easily used for self-service analytics. A commonly quoted axiom is that ‘big data is for machines; small data is for people.’”

You can reassure your leaders with your knowledge of the differences between big and small data and the fact that you are keeping up with new developments in your field. They can be applauded for staying current and asking good questions to help the organization be at its best.





To add a comment, Sign In
Total Comments: 2
William posted on 5/12/2016 10:47 AM
Thanks. This is one of the most cogent discussions on these issues I have heard in a long time.
Frank posted on 5/12/2016 2:29 PM
Thank you for the well-written article about the challenging uses of our data.

Thinking back over my 30 years of IR experience, I have found that it is critical to ask the right questions, know our users and the campus. We can have all the data in the world about our campuses and students, but if we don't ask the right questions, our data are absolutely worthless, if not dangerous. I'm personally worried about giving a Board member, faculty member or administrator a tool to access the data when he/she doesn't know the data or the tool. I think we in IR are getting absorbed about compiling the best data and making it available online. "Self-serve" is tantalizing, but fraught with major dangers. When our users don't know the right questions, "self-serve" data will soon become a total waste of their time. I don't know how to teach people to ask the right questions, but I've found that a discussion with the person about his/her issue helps greatly.

I'm an older guy, having begun IR using a VAX and an IBM PCXT. But I really enjoy using the new BI toys to access our huge amount of historical data. We uncover many useful things with these tools, but we have to be careful about using these tools. It's nice printing out a nifty report using Tableau and emailing it to a user, but if they don't know what the report is about or cannot interpret it, they could make really bad decisions.

And if a bad decision is made, you know who gets blamed!