When the police arrive at a crime scene sometimes it is very clear who they need to apprehend - just look for the guy with the gun. However, at other times, identifying the “good guys” from the “bad guys” takes more work. One can arrest everyone and sort it out later, but that leads to a lot of unhappy citizens. In our current legal system, due process ensures efforts are taken to identify the culprit before they are arrested. Panel companies face this decision every day when policing their panel for duplicate respondents.
Lately, a lot of research has been published on panel duplication. Are these people sincere respondents who just like to join many panels or are they cheaters trying to game the system? What can we do to separate the two groups? Ultimately, when I buy a sample of 300, does that mean 300 unique people or does it mean 100 people answering three times? The purpose of this article is to assure you the following:
- With the advent of machine identifiable information, programming and hosting companies can and should identify duplicate respondents in a survey.
- Internal panel duplicates are likely “criminals” and have no place in a well managed panel.
- Across panel duplicates are likely “innocent bystanders” that provide accurate information, but should not be included twice.
Magnitude of the Problem
Some panels are susceptible to “bots” or other cheaters who sign up on their panel many times because no security measures are enforced. Clearly, a sample of 300 from these panels does not mean 300 different people. Any conclusions based on data collected from these sub-par panel companies are highly suspect. Furthermore, these panel companies allow concerns about online research to linger when they should have been put to rest long ago.
On the other hand, well managed panels with appropriate identity verification techniques had already “put away” the majority of these criminals. However, until recently cross-panel duplication was still a problem when using many vendors. When multiple panel companies were providing sample, they were unwilling to give panelist’s personally identifiable information to the company hosting the survey for fear of poachers. Recently, with the advent of machine identifiable information, well managed panel companies can now tackle the cross-panel duplication problem.
Recently, we did a blinded study among seven high quality panel providers (four regular access panels, two panel aggregators and two reward system panels) to examine the magnitude and the proper treatment of cross-panel duplication. The design of the study conducted was split into two important feasibility levels because the duplication rate would likely vary by feasibility. The two groups were a national sample (high feasibility) and the Austin DMA sample (low feasibility). As we expected, the magnitude of the duplication rate depended on the feasibility as well as the type of online access panel involved.
Since we only used high quality panels, the internal duplication rate was miniscule. Only vendor E had any duplicates in the high feasibility study. Slightly more internal duplicates occurred in the low feasibility study. Access panels (A-D) had lower internal duplication than the panel aggregators (E-F) or the reward system panels (G-H) in general. (See figures on page 7).
Unlike internal duplication, the cross panel duplication is not solved just by using high quality panels. Cross-panel duplication should not be ignored and can be rampant in studies where feasibility is tight. Another item of interest is that reward system panels really are tapping into a new source of panelist because their cross-panel duplication rate is so low. Lastly, this graph depicts that the sizeable magnitude of the problem should not be ignored. We need to decide what action should be taken. Should we entirely remove them from our data (treat them as “criminals”) or keep their first response (treat them as “innocent bystanders”)?
Answer to the Problem
Separating the “innocent bystanders” from the “criminals” in online research can be very difficult because no interviewer is present. We tested quality of these panelists on three criteria:
- Traditional Satisficing Measures
- Response Consistency
- Data Differences
On the whole, the cross-panel duplicate and unique respondents have similar satisficing measures. Duplicate respondents tended to complete the survey in a shorter amount of time, but tenure could explain this difference. We did not collect tenure information so these duplicates could be more used to taking surveys. (See table on page 8).
Cross-Panel duplicates answered the survey at least twice, so their answers were examined for consistency. Because we determined a match only if the answers were identical, 100% consistency was unlikely even for good respondents. Even good respondents will not necessarily answer identically on a grid of 10 point scales. The consistency we found was remarkable! We found that 38 of the 161 duplicates answered with at least 90% consistency. Furthermore, an overwhelming majority (93%) of duplicates answered with over 50% consistency. We concluded that cross-panel duplicates are giving careful thought to each question because they give the same answers the second time through! (See figure above).
We also examined data differences because if cross-panel duplicates differ consistently than unique respondents then we might need to adjust for bias. A simple significance test showed a significant difference in 12 of all 62 possible items. However, once the Scheffe’s multiple comparison adjustment was added, none of these
questions showed any difference between the unique and duplicate respondents. Thus, no hidden bias was found by including cross-panel duplicates.
In examining the scene of the crime, we concluded that cross-panel duplicates are not “criminals” trying to game the system. They are honest panelists who happen to get two invitations to the survey. They might not even know the problems in taking the survey twice. In fact, it is reasonable to provide two people the same service and expect compensation both times. Luckily, steps have been taken by the online research industry leaders to keep these “innocent bystanders” playing by rules.