Kegan Clark, Senior Researcher, Riot Games, shares the following experience regarding how best-worst scaling provides better linkage between survey responses and behavioral outcomes and delivers greater insights than standard 5-pt or 10-pt (Likert) scales. She’ll be one of the 40+ MRX industry speakers at the upcoming Sawtooth Software Conference (September 23-27) in San Diego.

At Riot, we use MaxDiff (best-worst scaling) to understand many aspects of how players feel about our products, as well as many characteristics of our audience. In addition to standard applications such as determining relative preference for concepts or product attributes, we also use MaxDiff for some less common applications as well.

Because it avoids scale-use bias, MaxDiff provides us with a variety of advantages over Likert scales (e.g. 5- or 10-point ratings), such as the ability to compare between regions (e.g. the USA and China) without worrying about cultural differences in question interpretation. We believe this boosts the overall signal (relative to noise) we are getting from our surveys. Whenever we are seeking to understand the relative preference or importance of N concepts, we typically turn to MaxDiff.

For example, as a video game developer, we are constantly looking to make it more fun to play our games. There are a huge number of things we could do to improve a game, but limited resources to do so. This means we have to prioritize the most important problems to fix, or features to build. We use MaxDiff to understand from a list of dozens of potential problems, which are most and least frustrating to players.

In the past when we attempted to do this using Likert scale questionnaires, the differentiation between items was very poor. Additionally, we could never find any meaningful difference in the behavioral outcomes of players who’d strongly disagreed vs. strongly agreed with any of the statements asked. This suggested we were getting very low signal from the Likert questions. By using MaxDiff instead, we were able to find many frustrations for which high utility scores correlated with a decline in play time in the game. Knowing that has allowed us to prioritize fixing the most important drivers of churn first.