The challenge facing explainable AI is in creating explanations that are both complete and interpretable: it is difficult to achieve interpretability and completeness simultaneously. The most accurate explanations are not easily interpretable to people; and conversely the most interpretable descriptions often do not provide predictive power.
— from Explaining Explanations: An Overview of Interpretability of Machine Learning
Unpacking some important terms from the quote,
- Interpretable explanation: an explanation that is understandable by humans,
- Complete explanation: an explanation describing the underlying reality accurately.
We would like ML models to be ‘transparent’. That is, given a model and its output, we’d like to be able to tell how and why the model produced that output. This is particularly useful when things go horribly wrong, e.g. when we create racist AI’s, or when an AI decision has particularly heavy consequences and should thus come packaged with a good reason for it. For example, if you ask kindly for credit, and your bank’s automated decision model denies it to you, you’d probably feel you have the right to an explanation. In Europe, you do.
To extract explanations from ML models is actually a tricky business in the era of Deep Learning, as neural networks tend to be quite impervious to interpretation. It’s a hard task to go looking at the mess of connections and weights and find much ‘meaning’ at all.
While researchers are making some progress on explaining the decisions of current models, it is easy to see that as ML systems get more complex, it gets harder and harder to maintain interpretability without losing completeness. In fact, due to the limitations of our cognition, interpretability is much easier to lose than it is to gain completeness, and given that below a certain point of interpretability an explanation is fundamentally useless, completeness just has to give.
In other words, past a certain complexity, if you want an intelligible explanation of what the ML model is doing, you’ll have to settle for the fact that the explanation might not reflect exactly what’s going on under the hood.
Reality vs explanation, rendered with questionable drawing skills
We have known for a while that very little of the workings of our own minds is accessible to us, together with even less of the workings of the external world. Yet, we still ask ourselves and other people the why of our behavior.
With this premise, to ask yourself why you did something, is to invite the part of your mind in charge of weaving narratives to make something up that respects a list of constraints, such as your model of yourself and the world, the way you think others see you, the way you think others should see you, and so on. Completeness doesn’t really figure in these requirements.
The fact is that, as humans, we just love interpretability much more than we do completeness. Explanations are there to drive behavior, so we tend to value neat, actionable narratives much more than we do complete accounts.
This sets weird incentives when considering our desire to have transparent Machine Learning models. If completeness is hard to maintain past a certain complexity, and we didn’t care that much for it in the first place, we are bound to drive ML systems towards interpretability instead. For complex enough systems, the completeness of the explanation might be so out of the question, that interpretability is all that’s left, leaving us to drive explanations towards the interpretations that we like the most, no matter how far they stray from the underlying reality.
I think that’s actually exactly what happens with humans and with the explanations we come up with for our own and other people’s behavior. It is only when we get marginally more exposure to our inner workings (through e.g. psychedelics or meditation), that we can directly perceive the incongruence between the explanations we create and what’s actually going on – leaving us baffled and in need for new explanations.