Would you trust an artificial intelligence algorithm that works eerily well, making accurate decisions 99.9% of the time, but is a mysterious black box? Every system fails every now and then, and when it does, we want explanations, especially when human lives are at stake. And a system that canâ€™t be explained canâ€™t be trusted. That is one of the problems the AI community faces as their creations become smarter and more capable of tackling complicated and critical tasks.
In the past few years,Â explainable artificial intelligenceÂ has become a growing field of interest. Scientists and developers are deploying deep learning algorithms in sensitive fields such as medical imaging analysis and self-driving cars. There is concern, however, about how these AI operate. Investigating the inner-workings of deep neural networks is very difficult, and their engineers often canâ€™t determine what are the key factors that contribute to their output.
For instance, suppose a neural network has labeled the image of a skin mole as cancerous. Is it because it found malignant patterns in the mole or is it because of irrelevant elements such as image lighting, camera type, or the presence of some other artifact in the image, such asÂ pen markings or rulers?
Researchers have developed various interpretability techniques that help investigate decisions made by variousÂ machine learning algorithms. But these methods are not enough to address AIâ€™s explainability problem and create trust in deep learning models, argues Daniel Elton, a scientist who researches the applications of artificial intelligence in medical imaging.
Elton discusses why we need to shift from techniques that interpret AI decisions to AI models that can explain their decisions by themselves as humans do. His paper, â€œSelf-explaining AI as an alternative to interpretable AI,â€ recently published in theÂ arXiv preprint server, expands on this idea.
Whatâ€™s wrong with current explainable AI methods?
ClassicÂ symbolic AI systemsÂ are based on manual rules created by developers. No matter how large and complex they grow, their developers can follow their behavior line by line and investigate errors down to the machine instruction where they occurred. In contrast, machine learning algorithms develop their behavior by comparing training examples and creating statistical models. As a result, their decision-making logic is often ambiguous even to their developers.
Machine learningâ€™s interpretability problem is both well-known and well-researched. In the past few years, it has drawn interest fromÂ esteemed academic institutions and DARPA, the research arm of the Department of Defense.
Efforts in the field split into two categories in general: global explanations and local explanations. Global explanation techniques are focused on finding general interpretations of how a machine learning model works, such as which features of its input data it deems more relevant to its decisions. Local explanation techniques are focused on determining which parts of a particular input are relevant to the decision the AI model makes. For instance, they mightÂ produce saliency mapsÂ of the parts of an image that have contributed to a specific decision.
All these techniques â€œhave flaws, and there is confusion regarding how to properly interpret an interpretation,â€ Elton writes.
Elton also challenges another popular belief about deep learning. Many scientists believe that deep neural networks extract high-level features and rules from their underlying problem domain. This means that, for instance, when you train aÂ convolutional neural networkÂ on many labeled images, it will tune its parameters to detect various features shared between them.
This is true, depending on what you mean by â€œfeatures.â€ Thereâ€™s a body of research that shows neural networks do in factÂ learn recurring patterns in imagesÂ and other data types. At the same time, thereâ€™s plenty of evidence thatÂ deep learning algorithms do not learn the general featuresÂ of their training examples, which is why they are rigidly limited to their narrow domains.
â€œActually, deep neural networks are â€˜dumbâ€™- any regularities that they appear to have captured internally are solely due to the data that was fed to them, rather than a self-directed â€˜regularity extractionâ€™ process,â€ Elton writes.
Citing aÂ paperÂ published in the peer-reviewed scientific magazineÂ Neuron, Elton posits that, in fact, deep neural networks â€œfunction through the interpolation of data points, rather than extrapolation.â€
Some research is focused on developing â€œinterpretableâ€ AI models to replace current black boxes. These models make their reasoning logic visible and transparent to developers. In many cases, especially in deep learning, swapping an existing model for an interpretable one results in an accuracy tradeoff. This would be a self-defeating goal because we opt for more complex models because they provide higher accuracy in the first place.
â€œAttempts to compress deep neural networks into simpler interpretable models with equivalent accuracy typically fail when working with complex real-world data such as images or human language,â€ Elton notes.
Your brain is a black box
One of Eltonâ€™s main arguments is about adopting a different view of understanding AI decision. Most efforts focus on breaking open the â€œAI black boxâ€ and figuring out how it works at a very low and technical level. But when it comes to the human brain, the ultimate destination of AI research, weâ€™ve never had such reservations.
â€œThe human brain also appears to be an overfit â€˜black boxâ€™ which performs interpolation, which means that how we understand brain function also needs to change,â€ he writes. â€œIf evolution settled on a model (the brain) which is uninterpretable, then we expect advanced AIs to also be of that type.â€
What this means is that when it comes to understanding human decision, we seldom investigate neuron activations. Thereâ€™s a lot ofÂ research in neuroscienceÂ that helps us better understands the workings of the brain, but for millennia, weâ€™ve relied on other mechanisms to interpret human behavior.
â€œInterestingly, although the human brain is a â€˜black boxâ€™, we are able to trust each other. Part of this trust comes from our ability to â€˜explainâ€™ our decision making in terms which make sense to us,â€ Elton writes. â€œCrucially, for trust to occur we must believe that a person is not being deliberately deceptive, and that their verbal explanations actually maps onto the processes used in their brain to arrive at their decisions.â€
One day, science might enable us to explain human decisions at the neuron activation level. But for the moment, most of us rely on understandable, verbal explanations of our decisions and the mechanisms we have to establish trust between each other.
The interpretation of deep learning, however, is focused on investigating activations and parameter weights instead of high-level, understandable explanations. â€œAs we try to accurately explain the details of how a deep neural network interpolates, we move further from what may be considered relevant to the user,â€ Elton writes.
Self-explainable artificial intelligence
Based on the trust and explanation model that exists between humans, Elton calls for â€œself-explaining AIâ€ that, like a human, can explain its decision.
An explainable AI yields two pieces of information: its decision and the explanation of that decision.
This is an idea that has been proposed and explored before. However, what Elton proposes is self-explaining AI that still maintains its complexity (e.g., deep neural networks with many layers) and does not sacrifice its accuracy for the sake of explainability.
In the paper, Elton suggests how relevant causal information can be extracted from a neural network. While the details are a bit technical, what the technique basically does is extract meaningful and present information from the neural networkâ€™s layers while avoiding spurious correlations. His method builds on current self-explaining AI systems developed by other researchers and verifies whether explanations and predictions in their neural networks correspond.
In his paper, Elton also discusses the need to specify the limits of AI algorithms. Neural networks tend to provide an output value for any input they receive. Self-explainable AI models should â€œsend an alertâ€ when results fall â€œoutside the modelâ€™s applicability domain,â€ Elton says. â€œApplicability domain analysis can be framed as a simple form of AI self-awareness, which is thought by some to be an important component for AI safety in advanced AIs.â€
Self-explainable AI models should provide confidence levels for both their output and their explanation.
Applicability and domain analysis is especially important â€œfor AI systems where robustness and trust are important, so that systems can alert their user if they are asked work outside their domain of applicability,â€ Elton concludes. An obvious example would be health care, where errors can result in irreparable damage to health. But there are plenty of other areas such asÂ banking, loans, recruitment, and criminal justice, where we need to know the limits and boundaries of our AI systems.
Much of this is still hypothetical, and Elton provides little in terms of implementation details, but it is a nice direction to follow as the explainable AI landscape develops.