Codesign for Legibility (to AI and Everyone Else)

Adam Chlipala

This post is crossposted from my Substack, Structure and Guarantees, where I explore how formal verification might scale to more complex intelligent systems.

The starting point here is the familiar idea of predictive coding, where a learning loop refines a model of the world based on prediction errors. The twist is to expand that loop: not just improving the model, but also modifying the world to make it easier to model.

That shift toward making systems more legible can lead to more effective designs overall, including by bringing more problems into the range where fast, reliable symbolic methods apply.

The last few posts have covered how, while today’s mainstream ideas in generative AI have phenomenal capabilities in searching large data sets, they have serious downsides in both how long they take to return answers and how reliable those answers are. It’s natural to expect that approaches fall on a trade-off spectrum, and if some approach becomes very popular, it must score very well for at least one of speed or answer reliability, so it may be surprising that the reigning style has issues in both dimensions. I briefly started making the case that more symbolic, logic-based methods have promise to address both complaints. So why aren’t such methods taking over the world already?

The straightforward answer is that, so far, they’ve demonstrated dramatically worse answer quality on most problems of high interest today. If we take a quick survey of popular applications of AI, we’ll find deep learning and friends way out in front on each one. But does that summary really imply that the future is centered on statistical machine learning? I’m going to present an alternative framework that justifies an answer of “no,” though first I want to summarize a more mainstream framework.

Predictive Coding

Predictive coding is a theory of intelligence that is popular with people thinking about powerful AI. Roughly, the theory of predictive coding encourages us to go beyond simplistic models that assume our senses perceive the objective truth of the world directly. Rather, instead our brains need to develop fairly detailed internal models of the world, and we can integrate sensory inputs only with respect to models, finding what model-compatible world state best matches the inputs. When we notice bad results from our current models, we update them by considering the details of what goes wrong, much like how training deep-learning models works with backpropagation to reverse-engineer inaccurate decisions into the right changes to model parameters.

This diagram shows the basic idea in a learning loop. The observer fine-tunes his world model to help him better perceive a vase. Cases where the model makes bad predictions at odds with new inputs trigger modifications to the model.

Why we do find ourselves in situations where modeling phenomena of interest is very challenging? I’ll focus on one broad kind of mechanism: when those phenomena are produced by evolutionary processes that don’t optimize for legibility. That is, some evolutionary process provides feedback on intermediate designs, but it doesn’t penalize designs that are hard to understand.

Even if we were optimizing explicitly for legibility, we could find evolution getting stuck in local optima. The reason is that evolution proceeds through small changes, each of which needs to improve fitness in at least some small way, otherwise a variant would be discarded. There may exist radically more legible redesigns that nonetheless can’t be reached through sequences of small steps that all improve legibility and/or whatever other objectives. (Maybe we find a path of gradually increasing legibility that reduces real-world practicality at intermediate points, even though practicality spikes upward by the end.)

It’s not just biological evolution that can exhibit this problem. We can also see it in deliberately engineered systems, through viewing ourselves as a distributed system for finding better technical ideas. I wrote previously about signaling as an optimization for such systems, where participants go out of their ways to show off their otherwise-hidden fitness qualities through costly displays, to provide earlier signal that helps the optimization algorithm prune unpromising paths. The presence of signaling can actually lead toward optimizing (parts of) systems for worse legibility, to provide opportunities for particularly exaggerated displays of competence.

The upshot is that we may very well have wound up with a variety of canonical AI problems that seem, with respect to our current knowledge, to be irreducibly complex: where only the kind of large, unstructured model produced by deep learning can deliver good-enough decisions. We could even take a cue from Kolmogorov complexity and define complexity of a problem in terms of the length of the shortest model that understands it well enough. The latest foundation models are described in terabits of weights – where “tera” means 10 to the 12th. On the one hand, we can celebrate the engineering achievement of training models that have so much relatively unstructured complexity to them and recognize that complex descriptions are probably necessary to understand a variety of important phenomena. On the other hand, more-complex models tend to be more expensive to find and execute. What both predictive coding and the practice of machine learning share is learning via loops that tend to increase complexity: as a model is better-and-better fitted to an underlying phenomenon, the model gets more complex, or it converges to better results because there was complexity inherent in its architecture from the start. What if we extended such loops so that they could also include steps that by design reduce complexity?

Codesign for Legibility

It’s largely taken for granted today that the way to make progress in the face of challenging reasoning problems is to improve AI systems. However, another technique can be even more powerful: changing problems to be easier to solve. The revised problems can be better fits for AI approaches with superior properties in speed and reliability. Classical rule-based systems don’t just take advantage of well-defined logical structure when it exists; they often also fail completely in domains where such structure is not known. Can we redesign systems at a higher level to help structure become clear?

I wrote previously about codesign with the example of autonomous vehicles. The broader idea is that we can make AI problems simpler by changing the contexts they operate in. Using that idea, we can take the predictive-coding world of a learning loop and transform it into a codesign loop that alternates between steps of learning and changing the world being learned, with an eye toward improved legibility, or ease of learning.

Returning to our idea of measuring complexity in terms of bit counts needed to describe the world, we are now combining steps that refine models, which may indeed incorporate additional bits; and steps that simplify the world, which actually reduce complexity of effective models when applied properly. This framing suggests a very different perspective than celebrating engineering that enables learning many bits. Instead, we see progress as coming from changing the world to require as few bits to describe as we can get away with. Such compression of knowledge comes from structure that maps well to the world, which can in turn support crisp guarantees. This next stylized graph shows how alternating these kinds of changes helps us wiggle description complexity downward. Even after investing in more-complex models, we can find paths back to simplicity, guided by what would help the models, through world simplification.

From this perspective, we should change the world to remove challenging AI problems as far as possible, guided by experience building earlier generations of systems. Here are a few examples of opportunities of this kind, most of which I covered in earlier posts, where I highlight examples of evolutionary dynamics leading to environments that are harder than necessary for intelligent agents. Each example identifies a hard problem associated with AI and finds a way to replace it with a better-structured alternative that streamlines automation. I’m going to cover these suggestions in increasing order of controversy or effort to reconfigure the world.

One good example has already been adopted widely in software engineering. The first web applications were designed just to be used directly by humans with browser GUIs. However, soon-enough some users wanted programs to access web applications on their behalfs. Initial efforts involved unpleasant engineering approaches like scraping, which required software to understand both natural language and visual layout. Eventually the owners of many web applications started offering web APIs, ways of accessing the same services in ways friendlier to automated understanding, requiring solving exactly no problems considered “AI.”
Software programming is a domain where we are in control, with humans having deliberately designed the programming languages and other tools. Programming languages have always been designed to make programs easier for humans to understand, though inertia leaves in-place some bad design ideas. There is even some signaling going on, where language designers sometimes include complexities because they introduce puzzles that programmers enjoy solving. Instead of sticking with languages that happened to have many examples included in the first big LLM training runs, we should change the way programming works so that AI has an easier time spotting bugs and so on.
We should change the environments that autonomous vehicles inhabit to simplify vision and other problems relevant to effective control. Instead of roads full of unpredictable phenomena, in some settings, a model more like subway tunnels is appropriate. Today’s road network evolved with human drivers in mind, not considering the possibilities for cost savings through standardization and simplification.
Now gradually ramping up in speculative nature of proposals, moving away from use of natural language will simplify many relevant problems. As more of the economy is dominated by AI agents, they will have options beside natural language for coordinating with each other. Natural languages are creaky machines that were shaped by evolutionary processes that didn’t select strongly for lack of ambiguity or simplicity of processing. In fact, signaling pushes toward more-complex language that can be used to show off cognitive ability. Protocols for communication by AI agents needn’t maintain any of that baggage. (Actually, the prior example of adoption of web APIs is an early case of this principle reduced to practice very effectively!)
I can’t resist dropping in one more idea that is quite speculative, which you can take or leave, independently of buying into this broader framework of codesign. Biology is full of mysteries, which are mysterious to our puny brains because, with the possible minute exception of the most-recent past, there hasn’t been evolutionary pressure towards organisms being able to understand their own workings. We can keep working on reverse-engineering our evolved mechanisms, but the long run may see even better results from replacing parts of ourselves. For instance, artificial replacement organs may follow engineering best practices and be easier to understand and optimize than natural organs.

The general approach, of an improvement loop that combines learning and modification of the phenomenon to learn, is an underappreciated secret weapon on the path to effective intelligent systems. We’re considering the word “system” in a broad sense that keeps the above codesign examples in-scope.

Consequences for Feasible Automated Understanding

It may be that today’s canonical AI problems require learning mathematical functions of seemingly irreducible complexity, making logic-based methods fundamentally inapplicable. However, by changing the world so that it exposes different decision problems, we can make logic-based reasoning competitive. By giving the world legible structure, we enable methods that thrive on structure, potentially enjoying benefits for both performance and explainability and mathematical guarantees broadly. Two of the major categories for upcoming posts are (1) more specifics of how the world can and should be different to support effective automated reasoning and (2) the right architecture of those reasoning systems that take advantage of structure, looking across the whole stack of hardware and software. Pulling on these threads will take us in a pretty-different direction from the industry consensus.