This post is crossposted from my Substack,Structure and Guarantees, where I explore how formal verification and related ideas might scale to more complex intelligent systems. User interfaces have traditionally been among the least formalizable parts of software, because they require reasoning about the limitations and preferences of human users. Here I argue that this challenge may shrink dramatically as more of the economy becomes AI agents interacting directly with each other.
The upshot of that last trend is likely to be a dramatic complexity drop in the part of specification-writing that has dealt with user interfaces. I’ll present three ways in which it becomes easier to make rigorous arguments that these new software components make their “users” “happy.”
Signaling and Interface Trends
Interfaces to computers have changed significantly over time. On the one hand, a significant part of the change has to do with genuine conceptual breakthroughs in how to design interfaces to promote user satisfaction. Some interface ideas depend on raw computational power, where it may have been clear to earlier designers what they wished they could show to users, but bringing those ideas to life had to wait for compute to scale. Probably most observers would have those factors in mind, thinking back along the history of user interfaces.
This cartoon shows computer users wearing the kinds of clothing associated with their time periods, because I want to draw a connection between computer user interfaces and fashion in clothing. I think this analogy explains only a modest fraction of the cost of creating and maintaining user interfaces today, but it’s a real source of costs that I expect should now decline in importance.
Fashion is widely acknowledged as a great example of signaling and status competition. Sure, some changes in clothing arise from technological improvements that expand the space of what is possible to fabricate at reasonable cost. However, we accept that change in this domain is overwhelmingly driven by aesthetics.
At some level, fashion trends with the most subtle differences over prior ones are the most valuable for signaling. It may be the hardest job to design new fashion whose advantage is only clear to insiders. Then those insiders get to show off their savvy by picking out only those new fashions that succeed at the game. At some level, the participants in this status contest are showing off their working memories, by maintaining mental mappings from fashion trends to their timelines, so they can distinguish between styles that are less familiar because they’re so last decade vs. representing an avant-garde worth getting in on early. The general cognitive load of this process makes it useful to measure intelligence of potential mates and coalition partners, whether we are talking about “book smarts” intelligence or social intelligence to stay on top of group dynamics.
I would argue that computer user interfaces include a similar dynamic, not as centrally as for clothing fashion, but still important-enough to affect adoption trends. Now, I have a selfish reason to make this case, as a specialist in the more “backend” parts of computer systems. When I design user interfaces, they almost universally get panned as archaic. I give the objectors the benefit of the doubt that they are noting some overlooked genuine practical innovations in how to boost user productivity. However, it sure feels to me like there is an element of remembering interface vibes from different eras and wanting to be reminded of only the latest aesthetic, independently of the fundamentals behind each style. The explanation definitely can’t be that I still remember being excited in high school as HTML tables and frames first came out, right?
Whatever the role of signaling in user interfaces, we might hope that AI agents will have better things to do than worry about fashion. What formal analysis we may perform can focus on utilitarian requirements.
The Rise of Minimalist, Structured Interfaces
We don’t need to rely on sci-fi futurist mode to see important interface changes. A 2025 Salesforce article argued that GUIs would decrease in importance as AI agents proliferate. Another article from just days ago makes the case for the increasing importance of the kind of highly technical interfaces like RESTAPIs, which are convenient to use from programs – software using software. If the “workers” of the future are increasingly software, then naturally they’ll favor this kind of interface.
Let’s get a bit more specific. I’ll argue that human-facing UIs are greatly influenced by limitations in human working memory, what we can hold front-of-mind at once. Take the example of the controls for an airplane.
The airplane is a complex distributed system, with both digital and analog parts, their full, confusing detail visualized in the leftmost panel. In theory, a pilot can accept input from any of the sensors in the system or send requests to any of the actuators. The trouble is that humans, even the elite ones trained as pilots, can’t remember the full interface at that level. We are saving a user from a search process that crawls over a full system description and brings back the parts that are relevant, as the actual cockpit shown in the middle panel. With software design as we are used to it, the developer performs that expensive search up-front, stocking each “screen” of a program with exactly the input and output options deemed relevant to that stage of execution. This framing fits with the theory of distributed cognition, which holds that representations are split across participants in a system.
However, we don’t get the full story, just thinking about what is relevant to the user at each step. A user will often have some mission spread out across multiple steps, and working memory also makes it difficult to track mission status and what lies ahead. So the goal of a GUI in a single moment is more than just displaying the system’s affordances relevant to that moment: it also needs to summarize consequences of past decisions (how we got here) and options for next steps (where we may go). At some level, all of the bits of state throughout the whole system are sufficient to track both aspects, but our working memory can’t hold all of those bits, so we get the interface designer’s guess at what modestly large read-out will remind us where we are in a search process.
AI agents, represented in the rightmost panel, are just much better positioned at simply keeping all relevant state in memory accessible at reasonable cost. There are still differences in accessing different pieces of stored information, exemplified by memory hierarchies and context windows, but even the slowest storage method represents a huge win over what humans can muster. As a result, your canonical AI agent is ready to confront something like a fundamental description of a system’s parts and capabilities, which looks more like a data schema or an API specification than a GUI. My personal view is that designing better abstractions will remain important for increasing effectiveness of these systems, but so much can be accomplished with relative brute force.
Hence, in spelling out what makes a program acceptable, we need not worry about situation-specific details of interfaces and the capabilities of users to understand them. Instead, we can present relatively literal descriptions of what systems can do, an exercise typically much easier to get right. Humans remain important consumers of APIs today, requiring a certain amount of abstraction work by API designers, which may become increasingly unnecessary as AI usage predominates. (And, don’t worry: in later posts I’ll keep arguing for the importance of abstraction is making future systems faster and more reliable. It just seems the argument here is strong enough even continuing to lean heavily into brute force, as is the fashion today.)
The Value of an Open-Source User
There is another interesting shift in the dynamics of software to be used by software. It becomes relatively doable to have a machine-readable model of the user! The result is that the equivalent of usability testing becomes cheaper and can be automated.
The starting point is just creating a test suite for a service, which includes running a variety of common client programs against it. This kind of exercise is already standard today.
However, we can follow a strategy related to the one I wrote about previously, for AI agents coming to trust code provided by their competitors. Consider any user of the service whose source code is known to the service author. That user may be built by the same organization, or it may be released to the world as open source. Now the quality-assurance exercise is to carry out formal verification of the combined system, potentially including the service and all of its known users. Properties may even be checked of emergent behavior that arises from different users interacting with each other. So long as we can formalize, say, some productivity measure for the collective users, we can now (with enough proof effort) guarantee that measure across an infinite variety of scenarios, each lasting for infinite time. It’s a far cry from what is doable within the usability-testing budget for a software program today!
The change in perspective can really be quite a powerful booster to software development. It’s not just that we want to subject a new service to final checking before releasing it into the wild. We can evaluate usability repeatedly throughout development, to shape all the choices we make. Such checks can work like tool calls in AI coding assistants today. An AI searches the space of programs worth considering, repeatedly checking usability to bias its decisions.
Our AI-centric future may, of course, still include a role for services used by many different types of users, who now correspond to different programs, with different goals, written at different times. We can’t check the service’s usability in advance by reference to the specific source code of users who have not been constructed yet. Here, however, we come to what may be deeply satisfying to software engineers used to negotiating with finicky human users: we can impose a formal specification on the users, declaring that the service’s warranty is voided in the face of deviant behavior. The users can choose to invest in formally verifying themselves, in contrast to humans, who probably wouldn’t even bother to read these terms of service, let alone audit themselves for compliance. (This idea is related to design by contract in formal methods, which forces all parts of a software system to follow a common system of specification and sometimes proof.)
Conclusion
Turning over more of our economy to AI, it becomes important to be able to write down exactly what rules we want those agents to follow. If it is too difficult to formalize our requirements, then formal proof that agents meet them is a nonstarter. User interfaces have been a complex part of software as we know it, which might suggest a tough task ahead in formalizing that part for AI systems. However, if as I argue more and more of the activity of AI agents will be interacting with other agents, then there are a few reasons to expect that formalizing their interfaces will be easier than expected (at least for agents operating entirely in the interiors of zones of legibility).
User interfaces have varied to follow trends in fashion, which should not be a concern for AI agents.
User interfaces have been designed around limitations in human working memory, while interfaces for AIs can just present “the facts” in the most direct way.
When the users of a system are themselves programs, it becomes very cheap to test or even formally prove the system against its users, without any of the costs and delays of recruiting people for usability testing. Indeed, a single proof exercise can cover all users within some formally defined category (even an infinite one).
Drawing on these advances, it may become possible to move the equivalent of usability testing into strong formal proof.
The lingering objection I’m expecting at this point, from all you software engineers out there, is that the truly hard part of a project remains figuring out what users really want, which is mostly orthogonal to interface details. I agree: that element is what makes full automation of software engineering in today’s world a truly “AI-hard” problem! But in a world with even more AI centrality, does this aspect remain nearly as difficult? My next post argues why not.
This post is crossposted from my Substack, Structure and Guarantees, where I explore how formal verification and related ideas might scale to more complex intelligent systems. User interfaces have traditionally been among the least formalizable parts of software, because they require reasoning about the limitations and preferences of human users. Here I argue that this challenge may shrink dramatically as more of the economy becomes AI agents interacting directly with each other.
My last few posts have been reviewing ways that we should be optimistic about the potential to write good formal specifications for important software of the future, enabling formal verification and other ways to increase confidence in those systems. By doing the kind of serious formal verification that includes reasoning about the interfaces between system components, we often get additional cybersecurity benefits for free. More speculatively, we are probably at the beginning of an accelerating curve for reconfiguring our economy so that more of it involves AIs interacting directly with each other, saving the related software from dealing with the complexities of modeling humans.
The upshot of that last trend is likely to be a dramatic complexity drop in the part of specification-writing that has dealt with user interfaces. I’ll present three ways in which it becomes easier to make rigorous arguments that these new software components make their “users” “happy.”
Signaling and Interface Trends
Interfaces to computers have changed significantly over time. On the one hand, a significant part of the change has to do with genuine conceptual breakthroughs in how to design interfaces to promote user satisfaction. Some interface ideas depend on raw computational power, where it may have been clear to earlier designers what they wished they could show to users, but bringing those ideas to life had to wait for compute to scale. Probably most observers would have those factors in mind, thinking back along the history of user interfaces.
This cartoon shows computer users wearing the kinds of clothing associated with their time periods, because I want to draw a connection between computer user interfaces and fashion in clothing. I think this analogy explains only a modest fraction of the cost of creating and maintaining user interfaces today, but it’s a real source of costs that I expect should now decline in importance.
Fashion is widely acknowledged as a great example of signaling and status competition. Sure, some changes in clothing arise from technological improvements that expand the space of what is possible to fabricate at reasonable cost. However, we accept that change in this domain is overwhelmingly driven by aesthetics.
At some level, fashion trends with the most subtle differences over prior ones are the most valuable for signaling. It may be the hardest job to design new fashion whose advantage is only clear to insiders. Then those insiders get to show off their savvy by picking out only those new fashions that succeed at the game. At some level, the participants in this status contest are showing off their working memories, by maintaining mental mappings from fashion trends to their timelines, so they can distinguish between styles that are less familiar because they’re so last decade vs. representing an avant-garde worth getting in on early. The general cognitive load of this process makes it useful to measure intelligence of potential mates and coalition partners, whether we are talking about “book smarts” intelligence or social intelligence to stay on top of group dynamics.
I would argue that computer user interfaces include a similar dynamic, not as centrally as for clothing fashion, but still important-enough to affect adoption trends. Now, I have a selfish reason to make this case, as a specialist in the more “backend” parts of computer systems. When I design user interfaces, they almost universally get panned as archaic. I give the objectors the benefit of the doubt that they are noting some overlooked genuine practical innovations in how to boost user productivity. However, it sure feels to me like there is an element of remembering interface vibes from different eras and wanting to be reminded of only the latest aesthetic, independently of the fundamentals behind each style. The explanation definitely can’t be that I still remember being excited in high school as HTML tables and frames first came out, right?
Whatever the role of signaling in user interfaces, we might hope that AI agents will have better things to do than worry about fashion. What formal analysis we may perform can focus on utilitarian requirements.
The Rise of Minimalist, Structured Interfaces
We don’t need to rely on sci-fi futurist mode to see important interface changes. A 2025 Salesforce article argued that GUIs would decrease in importance as AI agents proliferate. Another article from just days ago makes the case for the increasing importance of the kind of highly technical interfaces like REST APIs, which are convenient to use from programs – software using software. If the “workers” of the future are increasingly software, then naturally they’ll favor this kind of interface.
Let’s get a bit more specific. I’ll argue that human-facing UIs are greatly influenced by limitations in human working memory, what we can hold front-of-mind at once. Take the example of the controls for an airplane.
The airplane is a complex distributed system, with both digital and analog parts, their full, confusing detail visualized in the leftmost panel. In theory, a pilot can accept input from any of the sensors in the system or send requests to any of the actuators. The trouble is that humans, even the elite ones trained as pilots, can’t remember the full interface at that level. We are saving a user from a search process that crawls over a full system description and brings back the parts that are relevant, as the actual cockpit shown in the middle panel. With software design as we are used to it, the developer performs that expensive search up-front, stocking each “screen” of a program with exactly the input and output options deemed relevant to that stage of execution. This framing fits with the theory of distributed cognition, which holds that representations are split across participants in a system.
However, we don’t get the full story, just thinking about what is relevant to the user at each step. A user will often have some mission spread out across multiple steps, and working memory also makes it difficult to track mission status and what lies ahead. So the goal of a GUI in a single moment is more than just displaying the system’s affordances relevant to that moment: it also needs to summarize consequences of past decisions (how we got here) and options for next steps (where we may go). At some level, all of the bits of state throughout the whole system are sufficient to track both aspects, but our working memory can’t hold all of those bits, so we get the interface designer’s guess at what modestly large read-out will remind us where we are in a search process.
AI agents, represented in the rightmost panel, are just much better positioned at simply keeping all relevant state in memory accessible at reasonable cost. There are still differences in accessing different pieces of stored information, exemplified by memory hierarchies and context windows, but even the slowest storage method represents a huge win over what humans can muster. As a result, your canonical AI agent is ready to confront something like a fundamental description of a system’s parts and capabilities, which looks more like a data schema or an API specification than a GUI. My personal view is that designing better abstractions will remain important for increasing effectiveness of these systems, but so much can be accomplished with relative brute force.
Hence, in spelling out what makes a program acceptable, we need not worry about situation-specific details of interfaces and the capabilities of users to understand them. Instead, we can present relatively literal descriptions of what systems can do, an exercise typically much easier to get right. Humans remain important consumers of APIs today, requiring a certain amount of abstraction work by API designers, which may become increasingly unnecessary as AI usage predominates. (And, don’t worry: in later posts I’ll keep arguing for the importance of abstraction is making future systems faster and more reliable. It just seems the argument here is strong enough even continuing to lean heavily into brute force, as is the fashion today.)
The Value of an Open-Source User
There is another interesting shift in the dynamics of software to be used by software. It becomes relatively doable to have a machine-readable model of the user! The result is that the equivalent of usability testing becomes cheaper and can be automated.
The starting point is just creating a test suite for a service, which includes running a variety of common client programs against it. This kind of exercise is already standard today.
However, we can follow a strategy related to the one I wrote about previously, for AI agents coming to trust code provided by their competitors. Consider any user of the service whose source code is known to the service author. That user may be built by the same organization, or it may be released to the world as open source. Now the quality-assurance exercise is to carry out formal verification of the combined system, potentially including the service and all of its known users. Properties may even be checked of emergent behavior that arises from different users interacting with each other. So long as we can formalize, say, some productivity measure for the collective users, we can now (with enough proof effort) guarantee that measure across an infinite variety of scenarios, each lasting for infinite time. It’s a far cry from what is doable within the usability-testing budget for a software program today!
The change in perspective can really be quite a powerful booster to software development. It’s not just that we want to subject a new service to final checking before releasing it into the wild. We can evaluate usability repeatedly throughout development, to shape all the choices we make. Such checks can work like tool calls in AI coding assistants today. An AI searches the space of programs worth considering, repeatedly checking usability to bias its decisions.
Our AI-centric future may, of course, still include a role for services used by many different types of users, who now correspond to different programs, with different goals, written at different times. We can’t check the service’s usability in advance by reference to the specific source code of users who have not been constructed yet. Here, however, we come to what may be deeply satisfying to software engineers used to negotiating with finicky human users: we can impose a formal specification on the users, declaring that the service’s warranty is voided in the face of deviant behavior. The users can choose to invest in formally verifying themselves, in contrast to humans, who probably wouldn’t even bother to read these terms of service, let alone audit themselves for compliance. (This idea is related to design by contract in formal methods, which forces all parts of a software system to follow a common system of specification and sometimes proof.)
Conclusion
Turning over more of our economy to AI, it becomes important to be able to write down exactly what rules we want those agents to follow. If it is too difficult to formalize our requirements, then formal proof that agents meet them is a nonstarter. User interfaces have been a complex part of software as we know it, which might suggest a tough task ahead in formalizing that part for AI systems. However, if as I argue more and more of the activity of AI agents will be interacting with other agents, then there are a few reasons to expect that formalizing their interfaces will be easier than expected (at least for agents operating entirely in the interiors of zones of legibility).
Drawing on these advances, it may become possible to move the equivalent of usability testing into strong formal proof.
The lingering objection I’m expecting at this point, from all you software engineers out there, is that the truly hard part of a project remains figuring out what users really want, which is mostly orthogonal to interface details. I agree: that element is what makes full automation of software engineering in today’s world a truly “AI-hard” problem! But in a world with even more AI centrality, does this aspect remain nearly as difficult? My next post argues why not.