We all know the value that our data can bring to our organizations. Some of us who are part of small institutional research offices must focus a great deal of our time on mandated reporting needs and critical ad hoc data. Others of us have the resources to commit to business intelligence systems, leveraging a data warehouse to streamline descriptive statistics and data visualizations to meet the needs of our institutions.
What about advanced analytics, though? It is the obvious next step after descriptive statistics, and online higher education sites seem to mention it all the time. Such techniques may include regression or decision trees from the world of supervised machine learning, or unsupervised methods such as clustering and dimensionality reduction. Deep learning methods that leverage neural networks are also readily accessible and applicable to a range of problems. Such models represent a shift from human-defined rules to systems where an algorithm takes the input data and results and subsequently develops the rules. Predictions or forecasts can then be made by applying those rules to new input data.
Access to such analytics techniques is very different today than it was 10 years ago. It is no longer necessary to purchase expensive licenses for complicated software. Instead, there are a number of low-cost or free options that provide advanced capabilities.
Before You Begin
No analytics approach is going to benefit your organization without a few fundamentals in place. First, you need to enter clean, well-understood data using well-controlled business processes. You also need to have data that can be connected, so you need to look across the range of institutional data in addition to pockets of data in different silos. You don’t need to have perfect data to get started, but you do need to be aware of the limitations of your data and work to address silos or any data governance issues.
It is also important to have a deep level of familiarity with advanced analytics methods. This includes understanding exploratory data analysis, the modeling process, and different types of models. It is also necessary to know how you assess the accuracy of different models and how you ensure you have all the relevant data within your model.
Adopting Advanced Analytics
There are many analytics options available through higher education software providers. Some are very good, while others are limited. Because of this, it is often best to leverage in-house resources to complete analytics work. We have likely all seen glossy presentations from vendors on the analytics capabilities in their software and they can create compelling cases, particularly with stakeholders who are unfamiliar with analytics.
However, when deciding whether to purchase an analytics solution or complete the project in-house, you need to start by asking yourself a few questions. Does the product leverage the full range of relevant data? By using the product, are you creating a second data warehouse housed within that system that duplicates other work at your organization? Does the software adequately handle the analytics methods from exploratory data analysis onward? Depending on the answer to these questions, as well as assessing the overall alignment with your existing data and analytics strategy, you can determine the best route forward. It is critical that institutional research offices make the recommendation on the route forward as the subject matter experts.
In terms of in-house options for advanced analytics, a strong case can be made for the use of either R or Python. Both are high-level programming languages that include a wide range of analysis methods.
R is a free environment for statistical computing that includes good graphics options and a comprehensive range of model types. It is well supported through an active online community.
Python is also an open source system with extensive packages providing access to different numerical methods. However, it is broader in the functionality it provides and extends well beyond statistical computing. Because of that, it is widely used as a development language and new methods are typically incorporated into Python, before they arrive elsewhere. Perhaps because of that, Python arguably has a slight edge over R and has maintained a high position in rankings of programming languages in recent years (behind only Java and C in February 2020, according to Tiobe), while R has fluctuated in popularity.
Regardless of the software, the following key steps will support progress into advanced analytics leveraging an open-source method.
Test the software of interest. They are free to install and there are extensive online resources available (R, Python). There may also be low cost training options through computer software courses at your institution. Further, online education providers, including Coursera, and others, have courses available. There are also extensive tutorials available online that will walk you through particular examples, so you can learn the syntax and see the systems in action. I particularly like Towards Data Science and the tutorials it provides.
If you look at R and Python, you will likely find that both can meet your technical needs in terms of the functionality that they provide. However, you may find that the syntax in one makes more sense to you and that you enjoy using one tool more than the other. Assess which one best fits with your existing IT infrastructure by working with your IT department.
With the software selection made, look across your data and identify a strong first project you can use to leverage advanced analytics. Is your institution facing a particular challenge that analytics can help you understand? Is there an operational area at your institution that is more likely to buy in to using analytics results? Alternatively, is there a high-profile project that you can complete to support institutional visibility of analytics that will promote buy-in? There is no wrong place to start, but be selective and choose a high-impact project.
Last, and most important, have fun! We have many tools at our disposal to help us more fully leverage our data to support the success of our students. Advanced analytics is one of those options, and it is more accessible than ever.