The Illusion of Understanding: MIT Unmasks the Myth of AI’s Formal Specifications

A study by MIT Lincoln Laboratory suggests that formal specifications, despite their mathematical precision, are not necessarily interpretable to humans. Participants struggled to validate AI behaviors using these specifications, indicating a discrepancy between theoretical claims and practical understanding. The findings highlight the need for more realistic assessments of AI interpretability.

Some researchers see formal specifications as a way for autonomous systems to “explain themselves” to humans. But a new study finds that we aren’t understanding.

As autonomous systems and artificial intelligence become increasingly common in daily life, new methods are emerging to help humans check that these systems are behaving as expected. One method, called formal specifications, uses mathematical formulas that can be translated into natural-language expressions. Some researchers claim that this method can be used to spell out decisions an AI will make in a way that is interpretable to humans.

Research Findings on Interpretability

Humans Struggle To Understand Outputs of Formal Specifications

A study finds humans struggle to understand the outputs of formal specifications, a method that some researchers claim can be used to make AI decision-making interpretable to humans. Credit: Bryan Mastergeorge

“The results are bad news for researchers who have been claiming that formal methods lent interpretability to systems. It might be true in some restricted and abstract sense, but not for anything close to practical system validation,” says Hosea Siu, a researcher in the laboratory’s AI Technology Group. The group’s paper was accepted to the 2023 International Conference on Intelligent Robots and Systems held earlier this month.

The Importance of Interpretability

Interpretability is important because it allows humans to place trust in a machine when used in the real world. If a robot or AI can explain its actions, then humans can decide whether it needs adjustments or can be trusted to make fair decisions. An interpretable system also enables the users of technology — not just the developers — to understand and trust its capabilities. However, interpretability has long been a challenge in the field of AI and autonomy. The

“Researchers confuse the fact that formal specifications have precise semantics with them being interpretable to humans. These are not the same thing,” Siu says. “We realized that next-to-nobody was checking to see if people actually understood the outputs.”

In the team’s experiment, participants were asked to validate a fairly simple set of behaviors with a robot playing a game of capture the flag, basically answering the question “If the robot follows these rules exactly, does it always win?”

Participants included both experts and nonexperts in formal methods. They received the formal specifications in three ways — a “raw” logical formula, the formula translated into words closer to natural language, and a decision-tree format. Decision trees in particular are often considered in the AI world to be a human-interpretable way to show AI or robot decision-making.

The results: “Validation performance on the whole was quite terrible, with around 45 percent

Overconfidence and Misinterpretation

Those previously trained in formal specifications only did slightly better than novices. However, the experts reported far more confidence in their answers, regardless of whether they were correct or not. Across the board, people tended to over-trust the correctness of specifications put in front of them, meaning that they ignored rule sets allowing for game losses. This confirmation bias is particularly concerning for system validation, the researchers say, because people are more likely to overlook failure modes.

“We don’t think that this result means we should abandon formal specifications as a way to explain system behaviors to people. But we do think that a lot more work needs to go into the design of how they are presented to people and into the workflow in which people use them,” Siu adds.

When considering why the results were so poor, Siu recognizes that even people who work on formal methods aren’t quite trained to check specifications as the experiment asked them to. And, thinking through all the possible outcomes of a set of rules is difficult. Even so, the rule sets shown to participants were short, equivalent to no more than a paragraph of text, “much shorter than anything you’d encounter in any real system,” Siu says.

The team isn’t attempting to tie their results directly to the performance of humans in real-world robot validation. Instead, they aim to use the results as a starting point to consider what the formal logic community may be missing when claiming interpretability, and how such claims may play out in the real world.

Future Implications and Research

This research was conducted as part of a larger project Siu and teammates are working on to improve the relationship between robots and human operators, especially those in the military. The process of programming robotics can often leave operators out of the loop. With a similar goal of improving interpretability and trust, the project is trying to allow operators to teach tasks to robots directly, in ways that are similar to training humans. Such a process could improve both the operator’s confidence in the robot and the robot’s adaptability.

Ultimately, they hope the results of this study and their ongoing research can better the application of autonomy, as it becomes more embedded in human life and decision-making.

“Our results push for the need to do human evaluations of certain systems and concepts of autonomy and AI before too many claims are made about their utility with humans,” Siu adds.

Reference: “STL: Surprisingly Tricky Logic (for System Validation)” by Ho Chit Siu, Kevin Leahy and Makai Mann, 26 May 2023, Computer Science > Artificial Intelligence.arXiv:2305.17258