In its purest form, data manipulation is the process of changing data in an effort to make it easier to read or be more organized. For example, a log of data could be organized in alphabetical order, making individual entries easier to locate. But what happens when data manipulation is not handled ethically? Controversies around Cambridge Analytica, Facebook, international fraudsters, and identity thieves have made us aware of how technology allows for our data to be manipulated.
As researchers, it’s vitally important that we’re aware of what it means to acquire and handle data ethically, especially in the face of constantly evolving technology. In this piece, we’ll look at how data can be manipulated, what it means to ethical, and how data manipulation can raise questions for researchers when it comes to technology solutions.
Data manipulation in the wild
No strangers to controversy, social media giant Facebook has walked a fine line of protecting and exploiting the data it’s the custodian of. In 2012, the organization came under fire when it was revealed that Facebook conducted a study of over 689,000 users without their knowledge or consent.
The study involved the manipulation of the users feed, to remove either positive or negative sentiment posts over the course of approximately a week, to observe how the user then posted as a result. One test decreased the users' exposure to their friends ‘positive emotional content’, which resulted in fewer positive posts of their own. Another test reduced their exposure to ‘negative emotional content’ and the opposite happened.
The study concluded:
"Emotions expressed by friends, via online social networks, influence our own moods, constituting, to our knowledge, the first experimental evidence for massive-scale emotional contagion via social networks."
The outrage from the academic community however, was centered on the conduct of the study, rather than the issue of data privacy alone.
It was not merely an observational study, which could be argued, since users consent in the acceptance of the Facebook terms of service. This particular study involved an intervention (i.e., the manipulation of the newsfeed), which lacked the element of informed consent for the participants.
This in itself is not necessarily unethical, as studies with interventions can be permitted on the grounds that such research aims could not be achieved any other way. However, there would be a number of standards to be met in order for such research to pass any kind of ethics test.
A lack of consent must be a necessary element in the research
There must be minimal risk to participants
There must be a likely positive outcomes balance over potential harms
There must be a debriefing of participants, as well as affording them an opportunity to opt out of the study
In the case of Facebook’s study, these guidelines were not followed or met, and it could be reasonably argued that the study was therefore unethical.
The potential for further misuse of this kind of manipulation of data, beyond a study of its outcomes is cause for concern. When the story initially broke in 2014, Clay Johnson, the founder of Blue State Digital, the firm that managed Obama’s online campaign for the US Presidency in 2008 asked, “Could the CIA incite revolution in Sudan by pressuring Facebook to promote discontent? Should that be legal? Could Mark Zuckerberg swing an election by promoting ‘upworthy’ posts from two weeks beforehand? Should that be legal?”.
These are certainly all relevant questions which have come further into our consciousness and political discourse, given the somewhat turbulent and divided global political climate.
Data manipulation and research
What does this mean for researchers in academic institutions at all levels, particularly those who are interested in utilizing technology to further their outcomes? Data is often ‘manipulated’ (in the truest sense), to make it more usable with technology solutions that help researchers delve deeper into their sources.
Researchers understand the impetus of being ethical in all research, but when it comes to technology that is designed to make decisions on your behalf using algorithms and artificial intelligence, you could be forgiven for feeling like you’re taking a leap of faith into unknown territory.
The key to remaining on the right side of ethical standards, and being able to utilize technology as it becomes available to you, is transparency and control.
For example, the automation of transcription has long been on the wish list of many qualitative and mixed methods researchers, who have either spent many long hours of their own time, or struggled to find research assistants to transcribe interview data on their behalf. Advances in artificial intelligence (AI) and natural language processing technology have now made this a reality, and human powered transcription is no longer a researcher’s only option.
One of the advantages of utilizing transcription powered by AI and natural language processing, is the transparency in your final source. The transcription is verbatim, as opposed to an interpretation or summary of what was said from a human point of view. This means when it comes to the analysis of your data, you’re analyzing a verbatim written version of your recorded audio source.
Transparency and control are key
Taking an ethical approach to your research work, whilst also being able to take advantage of the technology offered to you is a matter of being able to maintain transparency and control over your sources.
In a digital climate that is plagued with data scandals, privacy issues and a district lack of transparency, it’s imperative the research community are not excluded from the use of new technologies, but that they are developed in a way that maintains the high standards expected by researchers.
When Joaquin Phoenix fell in love with ‘Samantha’, in the 2013 film ‘Her’, he sets about creating a meaningful relationship with an operating system that is artificially intelligent, and able to communicate with him in a language he can understand.
At the time the film was released, Apple’s Siri technology had been in the market, and in the hands of users for about two years, so the concept of speaking to a ‘smart’ device, and having it speak back to you wasn’t something entirely foreign to audiences. In the world Spike Jonze created in ‘Her’, this technology had evolved far enough that a human was able to develop a real emotional connection to it.
In reality, we’re not quite at the point where an exchange with your computer or smart device may lead you to romantic feelings, but it does make us consider where the technology is headed.
What is Natural Language Processing?
The technology that drives Siri, Alexa, the Google Assistant, Cortana, or any other ‘virtual assistant’ you might be used to speaking to, is powered by artificial intelligence and natural language processing. It’s the natural language processing (NLP) that has allowed humans to turn communication with computers on its head. For decades, we’ve needed to communicate with computers in their own language, but thanks to advances in artificial intelligence (AI) and NLP technology, we’ve taught computers to understand us.
In a technical sense, NLP is a form of artificial intelligence that helps machines “read” text by simulating the human ability to understand language.?NLP techniques incorporate a variety of methods to enable a machine to understand what’s being said or written in human communication—not just words individually—in a comprehensive way. This includes linguistics, semantics, statistics and machine learning to extract the meaning and decipher ambiguities in language.
How is it used?
Frequently used in online customer service and technical support, chatbots help customers speak to ‘someone’ without the wait on the telephone, answering their questions and directing them to relevant resources and products, 24 hours a day, seven days a week.
In order to be effective, chatbots must be fast, smart and easy to use, especially in the realm of customer service, where the user's expectation is high, and if they’re experiencing a technical issue, their patience may be low. To accomplish the expected level of service,?chatbots are created using NLP?to allow them to understand language, usually over text or voice-recognition interactions,?where users communicate in their own words, as if they were speaking (or typing) to a real human being. Integration with semantic and other cognitive technologies that enable a deeper understanding of human language allow chatbots to get even better at understanding and replying to more complex and longer-form requests.
In a research context, we’re now seeing NLP technology being used in the application of automated transcription services (link out NVivo transcription). Transcription is one of the most time-intensive tasks for qualitative, and mixed methods researchers, with many transcribing their interviews and focus group recordings themselves by hand. Unless you’re an incredibly fast and accurate typist, this is an incredibly laborious task, taking researcher’s time away from the actual analysis of their data.
Automated transcription tools utilize NLP technology to ‘listen’ to recordings of data such as focus groups, and interviews, and interpret them and produce them into a format and language that is useful for the researcher to go on and analyze, either manually, or using software.
Future uses of NLP
The NLP market size is estimated to grow to USD 16.07 Billion by 2021, globally, giving us a strong indication that NLP technology has huge growth opportunities across a number of sectors.
An understanding of human language can be especially powerful when applied to extract information and reveal meaning or sentiment in large amounts of text-based content?(or unstructured information), especially the types of content that has typically been manually examined by people.
Analysis that accurately understands the subtleties of language, for example, the choice of words, or the tone used, can provide useful knowledge and insight. NLP will play an important part in the continued development of tools that assist with the classification and analysis of data, with accuracy only improving as technology evolves.
Academics at the University of Bologna have applied NLP to the most used part of any academic article: the bibliography. A group of researchers are developing tools that can extract information on citations using natural language processing and common ontologies (representations of concepts and their relationships) that can be openly accessed and connected to other sources of information. The idea of the project is to enrich the bibliography in order to give the reader more comprehensive information about each single entry, instead of looking at the bibliography as one large piece of information.
In the commercial word, NLP analysis will have uses especially in the analysis of the typically carefully worded language of annual reports, call transcripts and other investor-sensitive communications, as well as legal and compliance documents. Effective analysis of sentiment in customer interactions will allow for organizations to make improvements in their product and service delivery outcomes.
NLP will be essential to the future of research
More effective and accurate understanding between humans and machines will only strengthen the efficiencies and outputs of those who need to understand and analyze unstructured data.
No matter where it is applied, NLP will be essential in understanding the true voice of the research participant, the customer, or the user and facilitating more seamless interaction and interpretation?on any platform where language and human communication are used.