In its purest form, data manipulation is the process of changing data in an effort to make it easier to read or be more organized. For example, a log of data could be organized in alphabetical order, making individual entries easier to locate. But what happens when data manipulation is not handled ethically? Controversies around Cambridge Analytica, Facebook, international fraudsters, and identity thieves have made us aware of how technology allows for our data to be manipulated.

As researchers, it’s vitally important that we’re aware of what it means to acquire and handle data ethically, especially in the face of constantly evolving technology. In this piece, we’ll look at how data can be manipulated, what it means to ethical, and how data manipulation can raise questions for researchers when it comes to technology solutions.

Data manipulation in the wild

No strangers to controversy, social media giant Facebook has walked a fine line of protecting and exploiting the data it’s the custodian of. In 2012, the organization came under fire when it was revealed that Facebook conducted a study of over 689,000 users without their knowledge or consent.

The study involved the manipulation of the users feed, to remove either positive or negative sentiment posts over the course of approximately a week, to observe how the user then posted as a result. One test decreased the users' exposure to their friends ‘positive emotional content’, which resulted in fewer positive posts of their own. Another test reduced their exposure to ‘negative emotional content’ and the opposite happened.

The study concluded:

"Emotions expressed by friends, via online social networks, influence our own moods, constituting, to our knowledge, the first experimental evidence for massive-scale emotional contagion via social networks."
The outrage from the academic community however, was centered on the conduct of the study, rather than the issue of data privacy alone.

It was not merely an observational study, which could be argued, since users consent in the acceptance of the Facebook terms of service. This particular study involved an intervention (i.e., the manipulation of the newsfeed), which lacked the element of informed consent for the participants.

This in itself is not necessarily unethical, as studies with interventions can be permitted on the grounds that such research aims could not be achieved any other way. However, there would be a number of standards to be met in order for such research to pass any kind of ethics test.

  • A lack of consent must be a necessary element in the research
  • There must be minimal risk to participants
  • There must be a likely positive outcomes balance over potential harms
  • There must be a debriefing of participants, as well as affording them an opportunity to opt out of the study

In the case of Facebook’s study, these guidelines were not followed or met, and it could be reasonably argued that the study was therefore unethical. 

The potential for further misuse of this kind of manipulation of data, beyond a study of its outcomes is cause for concern. When the story initially broke in 2014, Clay Johnson, the founder of Blue State Digital, the firm that managed Obama’s online campaign for the US Presidency in 2008 asked, “Could the CIA incite revolution in Sudan by pressuring Facebook to promote discontent? Should that be legal? Could Mark Zuckerberg swing an election by promoting ‘upworthy’ posts from two weeks beforehand? Should that be legal?”.

These are certainly all relevant questions which have come further into our consciousness and political discourse, given the somewhat turbulent and divided global political climate.

Data manipulation and research

What does this mean for researchers in academic institutions at all levels, particularly those who are interested in utilizing technology to further their outcomes? Data is often ‘manipulated’ (in the truest sense), to make it more usable with technology solutions that help researchers delve deeper into their sources.

Researchers understand the impetus of being ethical in all research, but when it comes to technology that is designed to make decisions on your behalf using algorithms and artificial intelligence, you could be forgiven for feeling like you’re taking a leap of faith into unknown territory.

The key to remaining on the right side of ethical standards, and being able to utilize technology as it becomes available to you, is transparency and control.

For example, the automation of transcription has long been on the wish list of many qualitative and mixed methods researchers, who have either spent many long hours of their own time, or struggled to find research assistants to transcribe interview data on their behalf. Advances in artificial intelligence (AI) and natural language processing technology have now made this a reality, and human powered transcription is no longer a researcher’s only option.

One of the advantages of utilizing transcription powered by AI and natural language processing, is the transparency in your final source. The transcription is verbatim, as opposed to an interpretation or summary of what was said from a human point of view. This means when it comes to the analysis of your data, you’re analyzing a verbatim written version of your recorded audio source.

Transparency and control are key

Taking an ethical approach to your research work, whilst also being able to take advantage of the technology offered to you is a matter of being able to maintain transparency and control over your sources.

In a digital climate that is plagued with data scandals, privacy issues and a district lack of transparency, it’s imperative the research community are not excluded from the use of new technologies, but that they are developed in a way that maintains the high standards expected by researchers.

Learn more about research transparency today in this free whitepaper Transparency in an Age of Mass Digitization and Algorithmic Analysis.