Vanquish all bad data from my dataset!

Yes, we all try to do that. But are we doing it properly? Obtaining valid and reliable results from surveys requires good quality data, but poorly designed, lengthy, and boring surveys cause respondents to disengage, leaving us with bad data and incorrect action outcomes. Bad data wastes our time and our money.

When we look at the data quality of surveys, we’ve got to wade through different kinds of people participating in the research. First, we desperately need to find the Lazy Larrys. We need to find the people who are masking themselves as someone else. We want to find bots that are randomly choosing any answer option. And, as much as we don’t want it to happen, we need to find people who will say anything in a survey to get to the tiny pot of gold at the end.

At the same time, we must recognize that there are a lot of good, honest, hard-working people who are trying their best when answering surveys. Good Faith Gail leads a busy life. She goes to work every day, takes the dog for walks, cooks and cleans for a family of five, cares for her aging mother, and parents the little ankle biters that for some strange reason constantly want mommy’s attention. Unfortunately, all of the pieces of her life mean that she can never truly pay complete attention to our surveys. As a result, she sometimes makes mistakes in her answers. She doesn’t mean to; it just happens.

What we really need to do is figure out how to separate Lazy Larry from Good Faith Gail. For this study, we designed a generic cereal survey with many questions from horrid grids and poorly written questions, as well as lots of “select all that apply” questions, open ended questions, rating questions, and more. We gave our 8,000 respondents every possible opportunity to respond to the questions as they wished, and in the end formulated 30 possible errors.

The great news is that 23 percent of our respondents made no errors. The bad news is that 77 percent made at least one error. In other words, 77 percent of our survey completes could have been classified as poor quality data and deleted: money in the garbage and increased field times. However, we know this isn’t fair. We know that Good Faith Gail’s responses are in that collection of poor data because she made one or two small errors. We need a way to tease apart the Gails from the Larrys.

To do this, we grouped our respondents into (1) Demons who made many errors and (2) Angels who only made one or two errors. This segmentation gave us the ability to see which errors the Demons tended to make, and which the Angels tended to.

Red herring questions are a favorite of researchers. All you have to do is insert a fake answer into one of your questions, and you’ll know that when people choose, for example, that brand name – it can’t possibly be a valid response.

In our survey, people did choose the red herring answer. In fact, about 6 percent of Angels chose a red herring answer and about 15 percent of Demons. That difference of 9 points isn’t great enough to say with confidence that anyone who chooses a red herring is a bad respondent whose data should be removed. The brand names we make up often sound like real brand names or they’re local brand names that don’t show up in a Web search. It would be completely unfair to delete 6 percent of the data just because people chose one red herring.

But here’s a completely different story. When we identified people who selected two red herring answers, it became clear who the Angels and Demons were. Only 1 percent of Angels chose two red herrings, while 40 percent of Demons did the same. This is a great indicator of poor data quality.

Speeding is another of our favorite data quality measures. In this study, about 8 percent of Angels and 40 percent of Demons were in the fastest half of the normal curve. That alone is a nice difference between the groups, but I really don’t like deleting 8 percent of Angels just because they are computer literate, can type well, can read quickly, and have experience answering surveys.

However, if we really ramp up the definition of speeding to the fastest 1.5 percent of completes, we’ve now got a case where none of our Angels got a speeding ticket, but 30 percent of our Demons did. The fastest 1.5 percent of respondents were Demons. Delete them all and your Angels will all be safe.

Straightlining is also a commonly used indicator. Unfortunately, we are not generally good at implementing this solution. We often don’t use best practices in question design, which means that every item in the grid is phrased in a positive way. This problem isn’t straightlining, it is bad question design. In our survey, we found that about 14 percent of Angels ended up straightlining on a weak grid question. Sure, 60 percent of Demons also straightlined, but are you willing to automatically delete 14 percent of Angels just because they answered honestly?

Our survey also included a well-designed grid question. In this case, respondents were again asked how much they agreed with each statement. But, instead of phrasing every question in a positive way, half of the questions were phrased in a negative way – this cereal tastes good, it smells bad, it looks good, it’s expensive, the box is good. People answering honestly shouldn’t straightline and that’s what our results showed. Only 3 percent of Angels straightlined on the well-designed question, whereas 60 percent of Demons straightlined. If you write a best practice grid question, then absolutely use straightlining as a data quality measure.

A much less used data quality question is the rank question. Unfortunately, it relies on the fact that our basic numeracy and literacy skills are not always what they should be. In this case, we asked our respondents to rank order 8 items from 1 to 8. About 8 percent of Angels and 50 percent of Demons failed this measure by choosing the same number twice, or choosing a number that wasn’t from 1 to 8. The fact that 8 percent of Angels failed this does make me a bit nervous, but when 50 percent of Demons fail, I have to pay attention to it.

We also looked at a variety of contradictions to different types of questions within the survey. For example, a household of five people should probably spend more than $10 a week on groceries. In this case, about 8 percent of Angels contradicted themselves just once and about 40 percent twice. With an Angel failure rate that high, it doesn’t make sense to delete their data even though about 74 percent of Demons contradicted themselves twice. Unfortunately, contradictions are not a guaranteed way to separate Angels and Demons.

Open ends are highly sought after, since respondents use those sections to tell us things that they couldn’t say elsewhere. These, too, are helpful when it comes to data quality. When asking respondents to list out three reasons for their choice, about 1 percent of Angels shared fewer than 10 characters compared to 57 percent of Demons. This single measure may delete about 1 percent of our good data, but it would also delete a great portion of the poor data.

Finally, let’s consider a brand recognition question. We might assume that most people would recognize Kellogg’s, Adidas, iPhone, and more, from a list of well-known brands. Yet, about 4 percent of Angels indicated that they only knew one or two of the brands compared to 48 percent of Demons. Perhaps expecting everyone to know global brands is unreasonable. Readers could have simply misread or missed an option. New immigrants might not know them all. Non-shoppers or non-consumers might not recognize all the brands. We simply can’t assume that our survey respondents have the same life experiences that we do or that they know all the same brand names we do. In this case, if we were to delete data based solely on this one question, we would lose 4 percent of Angels. Not terrible, but not good either.

From this article, I hope a few things have come to mind for you. First, although 23 percent of our respondents answered the survey perfectly, they are the minority. I’m not perfect, you’re not perfect and there’s no way we should expect our respondents to be perfect. We know that many respondents are acting in good faith. They might have kids at their feet, a boss on the phone, or a spouse calling them to dinner. Any of these things mean that we don’t have all of their attention. And honestly, why should they give us 100 percent of their attention? We do offer incentives, but for the most part, in no way do they compare to how much money a respondent can or does earn for the same time spent at a full-time job. I’d love to know how many of you were checking emails, watching TV, or doing something else while reading this. If you can’t pay 100 percent attention to this one thing, why should anyone else?

We need to improve how we write our surveys. Historically, our data quality practices have been weak. We’ve used only one or two questions in attempts to pick out people who weren’t paying attention. Very often, we used weak grid questions where every single option was phrased in a positive way making it nearly impossible to not straightline. We’ve failed to recognize that real human beings get distracted and make mistakes. Even good intentioned, honest reliable people – like your mom or grandpa.

To summarize, here is what we can and should do. If you want to penalize Demons, use a red herring question. But make sure you incorporate two red herring items, not just one. In addition, not instead, make sure your grids incorporate both negative and positive phrasing so that you can validly measure straightlining. Also, see if you can find a place where a ranking question is appropriate. Demons have high failure rates on ranking questions so we should use that to our advantage, and do a simple character count on open-ends that everyone ought to have a reasonably lengthy response to.

The most important thing is to make sure that you measure multiple errors. We know Angels make all kinds of mistakes, not just the obvious, simple or lazy ones. But Angels will not make three, four or five mistakes. The next time you prepare a survey, make sure it’s designed to penalize Demons and not their Angel counterparts.