Common Analytics Terms: A Quick Reference
Have you ever been involved in a conversation in which two or more people are using the same term but with different meanings? Establishing the meaning of common terms prevents miscommunications and helps move conversations toward decision-making.
Although not exhaustive, the following list includes terms commonly used in analytics conversations. The terms were selected through discussion with the eAIR Editorial Committee.
Analytics is the intentional use of computational analysis to answer questions and propel business interests. There are a variety of specialized subtypes of analytics available to users (Analytics, n.d.).
Artificial Intelligence (AI) uses a variety of advanced techniques including machine learning to analyze big data, interpret events, make predictions, and support or automate decision-making (Gartner, 2021).
Big Data is high-volume, high-velocity, and/or high-variety data assets (Gartner,2021). Within higher education, big data includes both structured and unstructured data and smaller data sets subject to analysis (Webber & Zheng, 2020, p. 31).
A Cohort is a group of research subjects who satisfy a specific set of criteria at a set point in time. The cohort outcomes are tracked as part of a research study. An example of a cohort is the IPEDS graduation rate cohort which is restricted to full-time, first-time degree/certificate-seeking students who enter the institution during a particular year (NCES, n.d., IPEDS Survey Methodology).
Dashboards are a reporting mechanism that aggregate and display metrics and key performance indicators (KPIs), enabling them to be examined at a glance by users. Dashboards help improve decision making by revealing and communicating in-context insight into performance outcomes, using interactive visualizations indicating the progress of KPIs toward defined targets (Gartner, 2021).
Data refers to raw numbers or facts that have not been subjected to analysis. Data lacks context (Brown, 2021).
Data-Informed Decision Making uses data resources and analysis to provide insights, context, and evidence to support the decision making process. Human input remains key (Webber & Zheng, 2020, p. 8).
A Data Warehouse is a storage architecture designed to hold data extracted from transaction systems, operational data stores and external sources. The warehouse then combines that data in an aggregate, summary form suitable for organization-wide data analysis and reporting for predefined business needs (Gartner, 2021).
Descriptive Analytics describe past or current events. Common techniques used in descriptive analytics include data mining and summary statistics, such as frequency, percentage, mean, and standard deviation. Results may be displayed as a sliceable dashboard (4 Types, 2019; Mehta, 2017).
Diagnostic Analytics are focused on the past and determining not only what happened but why. Regression is a common analytical technique used in diagnostic analytics; results may be incorporated into an analytic dashboard (4 Types, 2019; Mehta, 2017).
Information is created by the analysis of raw data and the application of context (Brown, 2021).
Insight is created through the use of data and information in an effort to understand a situation or phenomena (Brown, 2021).
IPEDS stands for the Integrated Postsecondary Educations Data System. IPEDS is a survey system maintained and conducted by the NCES. All postsecondary institutions that are participants in federal financial programs are required to submit information (NCES, n.d., About IPEDS).
Machine Learning is a form of artificial intelligence that “trains” algorithms on historical/training data to find patterns. The trained algorithm is then used to make predictions or decisions based on new data (IBM, n.d.).
NCES stands for the National Center for Education Statistics. NCES is a federal entity with the responsibility of collecting and analyzing US education data and data from other nations (NCES, n.d., About Us).
A Null Hypothesis usually asserts there is no difference or no association, as compared to an alternative hypothesis which asserts there is a non-zero level of difference or association (Everitt & Skrondal, 2010, p. 306). Hypothesis testing is often used to determine if statistical significance exists.
Predictive Analytics uses past events to predict possible future outcomes. Common techniques include machine learning and predictive modeling. Future outcomes are evaluated as probabilities (4 Types, 2019; Mehta, 2017).
Prescriptive Analytics examines the viability of potential courses of action. Common techniques include simulations and neural networks (4 Types, 2019; Mehta, 2017).
A Relational Database contains multiple data sets / tables which can be linked through the existence of common data/variables (keys) (IBM, n.d.). In education, a common key would be the student identification number (SIN). The SIN can frequently be used to link a student’s personal demographic data to financial aid, class, and application data.
Statistical Significance measures the probability (p) of a null hypothesis being true at a set level of acceptable uncertainty (often p < 0.05) for a specific set of data (Everitt & Skrondal, 2010, p. 393).
Structured Data is data that adhere to a predefined model and conform to a tabular format with relationships between rows and columns. Common examples include Excel files or SQL databases (Enterprise Big Data Framework, 2019).
Unstructured Data is data that do not have a predefined data model or are not organized in a predefined manner. Common examples include audio files, video files or Non-SQL databases (Enterprise Big Data Framework, 2019).
References
Analytics: What it is and why it matters. (n.d.). Retrieved from https://www.sas.com/en_us/insights/analytics/what-is-analytics.html
Brown, J. (2021). Data vs. Information vs. Insight. Retrieved from https://online.ben.edu/programs/mba/resources/data-vs-information-vs-insight
Enterprise Big Data Framework (2019). Data Types: Structured vs. Unstructured Data. Retrieved from https://www.bigdataframework.org/data-types-structured-vs-unstructured-data/
Everitt, B. X., & Skrondal, A. (2010). The Cambridge dictionary of statistics (4th ed.). Retrieved from http://www.stewartschultz.com/statistics/books/Cambridge%20Dictionary%20Statistics%204th.pdf
4 Types of data analytics and how to apply them (2019, October 8). Retrieved from https://www.michiganstateuniversityonline.com/resources/business-analytics/types-of-data-analytics-and-how-to-apply-them/
Gartner (2021). Gartner Glossary. Retrieved from https://www.gartner.com/en/information-technology/glossary
IBM (n.d.). IBM Cloud Learn Hub. Retrieved from https://www.ibm.com/cloud/learn
Mehta, A. (2017, October 13). Four types of business analytics to know. Analytics Insight. https://www.analyticsinsight.net/four-types-of-business-analytics-to-know/
NCES (n.d.) About IPEDS. Retrieved from https://nces.ed.gov/ipeds/about-ipeds
NCES (n.d.). About Us. Retrieved from https://nces.ed.gov/about/
NCES (n.d.) IPEDS Survey Methodology. Retrieved from https://nces.ed.gov/ipeds/ReportYourData/IpedsSurveyMethodology#sec2
Webber, K. L., & Zheng, H. Y. (2020). Big data on campus: Data analytics and decision making in higher education. Baltimore: Johns Hopkins University Press.