
The start of a new year is your annual opportunity to try new things – including new research and analysis techniques. In 2025, make the most of Lumivero software to break through the status quo when it comes to data analysis – and make breakthroughs in your work.
In our new Behind the Breakthroughs series, we will highlight experts across industries who have used Lumivero products to find innovative ways to go deeper with their data. Kicking off the series is a look at how a team of consumer science researchers used Lumivero’s XLSTAT statistical software to help solve a tricky problem in survey research: removing bad respondents from online survey data.
Bad Respondent Data – The Downside to Online Surveys
Online surveys can be a powerful tool for market researchers. However, common survey design techniques can lead to poor-quality data. For example, asking participants to rank items in order of importance or ranking the relative importance of different items may lead to issues like neutral response bias, in which survey respondents avoid rating items at the extreme ends of the scale. Presenting participants with long lists of items to rank can result in survey fatigue – basically, participants get tired of answering questions and don’t give well-considered answers or simply don’t complete all the responses at all.
Even with a well-designed survey, researchers will need to clean data to remove bad respondents, and they’ll need to do it efficiently and without subjective decision-making. One survey design technique that can help with these issues is Best-Worst Scaling (BWS), also known as Maximum Difference Scaling, or MaxDiff.
What Is MaxDiff or Best-Worst Scaling?
The MaxDiff method of survey design involves asking respondents to consider a set of objects, then rank one item in the set as “best” (or “most important”) and another as “worst” (or “least important”). The survey usually presents respondents with three to five sets of objects to rank, and each set typically contains three to five objects to consider.
The advantages of this type of analysis include:
- Avoiding survey fatigue – Survey respondents don’t have to rank or score a long list of items.
- Eliminating neutral response bias – Respondents must choose one item over another.
- Providing richer insights – Researchers gain information about relative preferences between groups, which can be more useful for decision-making than an isolated ranking.
One risk of MaxDiff analysis is that respondents may sometimes give inconsistent answers between sets. For example, when asked about food cuisine preferences, a respondent may rank Italian as “best” and Thai as “worst” in one set, then flip preferences in another, ranking Thai as best and Italian as worst.
Inconsistent respondents can be easy to spot in small data sets. For large sets, statistical analysis is necessary. But which analysis technique is recommended?
Evaluating MaxDiff Data Cleaning Techniques
Working with food science researchers from Denmark and New Zealand, Dr. Fabien Llobell and Paulin Choisy of Lumivero’s XLSTAT team evaluated two different techniques for identifying bad respondents in data sets generated by MaxDiff surveys: root likelihood index (RLH) and normalized error variance (ErrVarNorm). Their paper, “Measurement and Evaluation of Participant Response Consistency in Case 1 Best-Worst-Scaling (BWS) in Food Consumer Science,” shows how XLSTAT can help researchers quickly and reliably evaluate respondent consistency to determine which participants to exclude from their final analyses.
The team looked at 18 different food consumer science surveys that used MaxDiff-style surveys. There were hundreds of responses to evaluate, making a robust statistical analysis technique necessary.
Root Likelihood Index – Influenced by Number of Survey Choices
Root Likelihood Index (RLH) looks at how consistent a participant’s responses are with the survey model. Calculating RLH for each participant results in a number between 0 and 1. The higher the number, the more consistent the participant’s answers are – at least in theory.
The XLSTAT team found that RLH values varied widely depending on the number of options in the MaxDiff choice sets. Because of this, you need to adapt your interpretation depending on the number of sets, and moreover, a perfectly consistent respondent can have a RLH lower than 1. This can lead to higher proportions of excluded participants in surveys with higher numbers of choices if you don’t adapt the cutting threshold.
Another mark against RLH is that it can take a significant amount of time to calculate, even with support from XLSTAT software!
Normalized Error Variance – A Direct Measure of Response Consistency
Normalized Error Variance (ErrVarNorm) is a measure of how consistent a respondent’s answers are with each other, rather than with the model. Like RLH, ErrVarNorm returns numbers between 0 and 1 to show participant consistency. When analyzing data with ErrVarNorm, the Lumivero team found that the proportion of excluded participants was fairly consistent regardless of the number of objects in the MaxDiff choice set. ErrVarNorm also involved a simpler calculation. The result? A simpler and more reliable method for guiding your survey data cleaning – all accessible from right within XLSTAT as a plug-in to Microsoft Excel.
Start Breaking Through with XLSTAT
Find out how you can perform richer, more robust data analysis that leads to compelling insights (and cleaner data) – request a demo of XLSTAT today.