Ahead of the 2016 U.S. Presidential election, polling and the voter projection industry predicted Hillary Clinton as the clear favorite to win over Donald Trump. They failed spectacularly. Trying to determine why the pollsters missed the mark so badly has been a common theme among the post-mortem election reports (Trump’s Win Isn’t the Death of Data—It Was Flawed All Along, Wired).

Clearly the methods and models used to predict the voting outcome failed to deliver an accurate forecast. Another article from The Hill stated it’s an industry-shattering embarrassment, which indicates that the polling industry is facing an inevitable disruption in order to regain trust.

So what went wrong and what should the polling industry do about it?

1. Start using implicit process data.  Pollsters rely too heavily on explicit data - literally the same old exam style Q&A surveys. Even common sense says “what you say is important, how you say it is even more important”. Explicit data is easy to collect but known to be plagued with quality issues. However, decades have passed and the same problematic methods for collecting poll data are still in use. Today’s technology enables something much better. Today’s technology makes it possible to collect better explicit data together with implicit process data at scale. The explicit data shows what people are saying. The implicit process data shows how they are saying it.

2. Do more than tallying, use real data analytics. Political polling is probably the oldest segment under the broad umbrella of market research. There are many established players in this field. Over time, they have become set in their ways of doing things. “Margin of error” became a universal quality metric, many things that should have mattered, stopped mattering. For example, tallying should have been the start of the analysis, but too often became the end as well. Predicting requires more analysis. Diagnosis requires more analysis. Developing action plans require more analysis too. Tallying alone won’t tell you the why and what to do! Without knowing this key information, you can’t and shouldn’t make predictions. As Cade Metz at Wired summarized, "this wasn’t so much a failure of the data as it was a failure of the people using the data."

3. Build profiles using people’s priorities. The practice of using demographic information as meaningful “labels” is not only misleading to the decision makers and the general public, but also problematic in other ways. For example, you must recognize the fact a person’s past experiences uniquely define their identities and affect their joys, wishes, worries, pains, and their causes in tangible and intangible ways. Every person is unique. No one is the same.

Relying only on convenient demographic labels hides these profiles. What a fatal mistake! “White college educated” means very little. “Millennials” means very little. Pick any of these segments and you will find they are way more diverse than expected. The action plans aiming at those labels seem to be specific and directed, but they miss the true targets.

By misusing the primary dimensions of segment voter data, what should have stood out as clear trends were instead interpreted by pollsters as noise.

What they care about the most, and what drives their intention to take action - these are the two questions pollsters should have used as the primary dimensions to understand voters. In this context, demographic information ought to be just a natural outcome at the end. By misusing the primary dimensions of segment voter data, what should have stood out as clear trends were instead interpreted by pollsters as noise. They missed, and lost.

There are a few significant reasons as to why the polling industry needs to address these three fundamental changes right away if they want to have any chance of regaining trust.

First, Internet and digital gadgets have transformed our world. The identity of today’s media and communication has transitioned (rightly or wrongly, justified or unjustified) from being a source of facts to a source of opinions. The short spurted, always-on but always-multitasking day-to-day actions have made individuals less interested in exchanging information or trying to understand another person’s perspective. The quick and easy thing to do is just to share my feelings and pass on my opinions - disguised as being more aware of diversity, people judge people more than before. These factors together are magnifying the nuanced but deep relationships between injunctive norms and descriptive norms. That’s a fancy way of saying - people today are more aware of “what they should say” than ever before. This causes, and will exacerbate, data quality issues if you use only the explicit data. “Undercover voters” may be a convenient term used today that will fall out of the news cycle soon, but there are deep reasons for that.

The valuable things to a campaign have to include: why things are happening this way (diagnostic), what we expect to see next week (predictive), and what options we have and how to approach each option (prescriptive).

Second, data analytics goes way beyond tallying. Among descriptive, diagnostic, predictive and prescriptive uses of data analytics, descriptive is the first step. It tells you what is happening. The national frenzy with “margin of error” in polls is in part because the polling industry limited itself to just “descriptive analytics”. That’s the problem, because in such a contentious election, the moment a poll result is in the news, it’s history, old, in the past. The valuable things to a campaign have to include: why things are happening this way (diagnostic), what we expect to see next week (predictive), and what options we have and how to approach each option (prescriptive). As many modern businesses use more data analytics by day, why is the polling industry still fixated on “margin of error” and stuck in the age of using “pen & paper” style of questionnaires?  

Furthermore, and specific in this election, “margin of error” not only lacked analytical value, it wonderfully misled everyone. It’s false precision, not informed accuracy. While people may have differing opinions, very sadly, the atmosphere of this election has made them share the same idea of “what I am not supposed to say.” As a result, their intentions are different but their explicit answers can be the same. That’s where tallying things up and reporting the margin of error gave everyone a false picture of where things stand. Ditch tallying and questionnaires. Use a real feedback analytics platform that can get to the implicit process data and infer a deeper level of feelings from people. For example, as one of the vendors in this market niche, Survature has helped Nascar races to know more about their fan base (in realtime) than our whole nation knew about the voters in this presidential election; and all that was done with just one person using a piece of software. It is the methodology that has led to better data, which in the end made the difference.

Third, profiling voter segments based on the wrong dimensions leads to wasting resources and missing opportunities. The dimensions have to be the issues that matter most to people. Knowing that, you will be able to find better ways to reach people, develop better marketing messages, and best yet, truly deliver and serve the right set of needs for each crowd of voters. $80 million can get spent quickly on TV ads, but if you have better information about your audience those ads can make a more substantial impact.