Introduction to Data Ethics
Every day, more than 2.5 exabytes (1 exabyte = 1 billion gigabytes) of data are produced. By the end of 2025, estimates suggest that number will rise to 436 exabytes. Data is truly everywhere, and its use is expanding and increasing at almost unimaginable rates. The increasing use of data contributes to the increasing likelihood of data misuse. Data misuse may be incidental or intentional; in either case, the potential for harm exists.
Incidental misuse of data results when data obtained for one purpose are used for an unrelated purpose and, in the process, uncover unvolunteered information. One of the classic examples of incidental misuse occurred in the early 2010s when the Target corporation used shoppers’ purchase data to identify pregnant customers to personalize coupon mailings. The shoppers never volunteered a pregnancy diagnosis, but Target developed a model that was able to identify customers who were likely to be pregnant. You can imagine the fallout, especially given how sensitive health information and pregnancy can be.
Intentional misuse of data may be clear-cut or subtle. On occasion, professional sports teams have been accused of using stolen data to gain an advantage. Some financial organizations have used data to target potential customers with specific vulnerabilities. Investigative agencies have been accused of misusing data to “profile” potential suspects. A more subtle form of intentional misuse can occur during data visualization or data-based storytelling. Injudicious choices in graphics can overemphasize positive aspects of data while minimizing the less appealing information.
Data misuse can be avoided through a deliberate use of data ethics. Floridi and Taddeo defined data ethics as follows:
… new branch of ethics that studies and evaluates moral problems related to data (including generation, recording, curation, processing, dissemination, sharing and use), algorithms (including artificial intelligence, artificial agents, machine learning and robots) and corresponding practices (including responsible innovation, programming, hacking and professional codes), in order to formulate and support morally good solutions (e.g. right conducts or right values). (2016, p. 1)
Under this definition, applying data ethics becomes a proactive process where ethical considerations are at the forefront of each step a data professional undertakes. The AIR Statement of Ethical Principles, released in 2019, reflects the ideals of data ethics espoused by Floridi and Taddeo. Consider some of the key words highlighted in the Principles. Each of these terms is proactive, considering what is the right choice to be made that falls in line with commonly accepted values.
A variety of articles on data ethics is available online. Book suggestions include Weapons of Math Destruction (O’Neil, 2017) and 97 Things about Ethics Everyone in Data Science Should Know (O’Reilly, Ed., 2020).
AIR Statement of Ethical Principles | Definition |
We act with integrity | "adherence to moral and ethical principles; soundness of moral character; honesty" |
We protect privacy (a) and maintain confidentiality (b) | (a) "the state of being free from unwanted or undue intrusion or disturbance in one's private life or affairs," (b) "the state of keeping or being kept secret or private" (referring to data); |
We act as responsible data stewards | "A person whose responsibility it is to take care of something." |
We provide accurate (a) and contextualized (b) information | a) "correct in all details," (b) includes "the circumstances that form the setting for an event, statement, or idea, and in terms of which it can be fully understood and assessed" |
Note. Definitions from www.dictionary.com or www.lexico.com
Kelly D. Smith, Ed.D., is a Senior Research Analyst at Central Piedmont Community College, where she supports student success and equity initiatives through data collection, analysis, and reporting which enables data informed decision making. In addition to her role in Institutional Research, Kelly utilizes her doctorate in Adult and Community College Education to design and develop data literacy training in collaboration with the Center for Teaching and Learning Excellence.