Our data-saturated culture loves research and numbers, but information can mislead. A host of factors play into accuracy that many people may not realize. It’s what separates quality, human-driven research from computerized short-cuts, and it’s what makes us all informed citizens and consumers. In this third installment of her three-part series, Candice Bennett, PRC, examines different aspects of data-gathering, what they mean, and why they matter. 

You’ve done all the right due diligence and research to ensure the research you’ve gathered is accurate, purposeful and actionable. You gave serious thought to engage the right people with the right questions to gather the information you need for your business, and to ensure no biases, monetary or otherwise, bled into the results.

But are you taking the right lessons away from the results? Are you understanding the right links, or are you making assumptions? The difference can have a direct impact on the success of any new ventures.

It’s easy to assume that because X happened, and Y followed, that X must cause Y. This is called, as one might expect, “causation.” But just because two things seem to happen in relation to each other doesn’t mean one caused the other -- they might be correlated, but they might be linked by causation. And it's not always easy to tell the difference. A few examples can help demonstrate.


Let’s start with correlation.

Apparently, the divorce rate in Maine correlates with U.S. margarine consumption, as do certain drownings with Nicolas Cage films. Do any of these cause the other? Probably not. (Like what we did there with "probably"?)

These graphs are an example of what we can do with modern tech -- there’s so much data out there, and so many ways to filter it and create visual tools (thank you, "Big Data"), that it’s easy to find something that conforms to a desired shape (consciously or unconsciously). These examples correlate with each other, but they don't cause each other.

Or take a more realistic example -- say a formerly red county in the swing state of Virginia turned purple and went to Obama in the last election. And say that the average price of a home in that county also went up. You could posit that a wealthier county means more votes for Democrats. But there could be other reasons for the vote to shift: more Millennials coming of voting age, more of them living at home with their parents, general voter frustration with local Republicans, changes in the Democrats’ campaign targets, a swelling of minority voters -- the list could go on. So maybe the county has grown in wealth, but that doesn’t necessarily mean causation.

The impact could be huge -- if the Republicans wanted to take back the county, and bet on targeting wealthy voters, they could risk completely missing voters’ intentions -- unless they knew to invest in more research to pinpoint issues beyond basic correlations. Finding those specific reasons would mean finding a direct causation. If your firm is paying for quantifiable research, you’ll want to make sure the information you’re receiving is very clearly tied to causation, and not just correlation.


Now, let’s add probability. Correlation isn’t without its usefulness -- even if the underlying cause isn’t known, a correlation can help us predict the future. 

This still depends on a lot of accurate statistical analysis. First, we need to figure out how close the causation is to reality. For example, it’s not always a given that a taller person will weigh more -- being tall doesn’t cause you to gain weight per se -- but plotted out with enough data points, we can see that in a lot of cases a taller person does weigh more than a shorter one, and so we might be able to make some population predictions about weight or height moving forward. Sometimes it doesn’t matter if we know the cause, as long as we can prove the link in the correlation enough that we can make some predictions. 

Statistical conclusions

So what does all this mean? 

Just because two things happened doesn’t mean one caused the other. Sometimes it’s hard to step back and objectively realize that. But correlation can be useful -- even if you can’t prove one thing caused another, you can run the numbers and get some odds about what might happen in the future (you can see how this would be helpful for business planning).

However, finding these links in a reliable way -- especially when it comes to finding the right kinds of correlations (and not the swimmers vs. Hollywood star comparisons) takes a deft hand and critical thinking.

Don't miss the first two articles in Candice Bennett's series, Part 1: The Truth Behind the Numbers and Part 2: How Do You Critically Interpret Data?