I use one of those credit monitoring services that regularly emails me about my credit score: “Congratulations, your score has gone up!” “Uh oh, your score has gone down!
These fluctuations of a couple of points don’t mean much. I shrug and delete the emails. But what’s causing those fluctuations?
Credit scores are just one example of the many automated decisions made about us as individuals on the basis of complex models. I don’t know exactly what causes those little changes in my score.
Some machine learning models are “black boxes,” a term often used to describe models whose inner workings — the ways different variables ended up related to one another by an algorithm — may be impossible for even their designers to completely interpret and explain.
This strange situation has resulted in questions about priorities: How do we prioritize accuracy, interpretability and explainability in the development of models? Does there have to be a tradeoff among those values?
But first, a disclaimer: There are a lot of different ways of defining some of the terms you’ll see here. This article is just one take on this complex issue!
Interpreting and Explaining Models
Let’s take a closer look at interpretability and explainability with regard to machine learning models. Imagine I were to create a highly accurate model for predicting a disease diagnosis based on symptoms, family history and so forth.
If I created a logistic regression model for this purpose, you would be able to see exactly what weights were assigned to each variable in my model to predict the diagnosis (i.e., how much each variable contributed to my prediction).
But what if I built a complex neural network model using those same variables? We could look at the layers of the model and their weights, but we might have a difficult time understanding what that configuration actually meant in the “real world,” or, in other words, how the layers and their weights corresponded in recognizable ways to our variables. The neural network model might have lower interpretability, even for experts.
Additionally, we can consider global interpretability (how does the model work for all our observations?) and local interpretability (given these specific data points, how is the model generating a specific prediction?). Both of those levels of understanding have value.
As you can imagine, with something like disease prediction, patients would want to know exactly how my model predicted that they had or didn’t have a disease. Similarly, my credit score calculation could have a significant impact on my life. Therefore, we’d ideally like to have models that are not just interpretable by the experts who construct them, but also explainable to people affected by them.
But even if explainability isn’t legally required for a particular situation, it’s still important to be able to communicate about a model’s workings to stakeholders affected by it. Some kinds of models are inherently easier to translate to a less technical audience. For example, some models can be visualized readily and shared.
In other cases, you may have to rely on quantitative measures that demonstrate how a model was constructed, but their meaning is less obviously apparent, especially for non-technical audiences. For example, many statistical models display how each variable is related to the model’s output (e.g., the coefficients for predictor variables in linear regression). Even a
Whichever method is used to gain insight into a model’s operation, being able to discuss how it makes predictions with stakeholders is important for improving the model with their informed input, ensuring the model’s fairness, and increasing trust in its output. This need for insight into the model might make you wonder if black boxes are worth the challenges they pose.
Should Black Boxes be Avoided? What About Accuracy?
It appears there doesn’t always have to be a tradeoff between accuracy and interpretability, especially given new tools and strategies being developed that lend insight into the operation of complex models.
Tools for Peeking Into Black Boxes
As mentioned above, humans are building tools to better understand the tools they’ve already created! In addition to the visual and quantitative approaches described above, there are a few other techniques that can be used to glimpse the workings of these opaque models.
Python and R packages for model interpretability can lend insight into your models’ functioning. For example, LIME (Local Interpretable Model-agnostic Explanations), creates a local, linear, interpretable model around a specific observation in order to understand how the global model generates a prediction with that data point. (Check out the
This video offers a quick overview of LIME from its creators.
Even the operation of complex image recognition algorithms can be glimpsed in part. “Adversarial patches,” or image modifications, can be used to manipulate the classifications predicted by neural networks, and in doing so,
Whatever approach you take to peek inside your model, being able to interpret and explain its operation can increase trust in the model, satisfy regulatory requirements, and help you communicate your analytic process and results to others.