Thanks!
One further axis is how the processes implementing these logical possibilities are structured: are they pieces of traditional software or are they models themselves (like in, e.g., some recent Sakana papers and their new product based on those papers (https://sakana.ai/fugu/).
The agentic framework research has produced some very interesting results; from different topologies to different ways of using tool-calls, it has been one of the most fascinating and accessible areas of research in the AI landscape. In this essay, I’d like to talk about some core structures that sit at the heart of various agentic framework applications we have seen, and some (possibly) new directions I’d like to explore.
I.
We must, before anything else, define the most primitive tool we have: a simple text-in-text-out API call. Following that, we have perhaps the most important one: JSON or structured outputs. Taking these two API calls, we can come up with four different paradigms (methodologies?) for creating frameworks, and explore which ones accomplish certain goals better than a simple TITO API call.
II.
With these, we basically cover the core primitives of different methodologies, and another great step forward (at the cost of high token usage) would be to mix and match these structures and observe what works best. A few ways to mix and match:
III.
Although the above architectures seem very simple to implement, they require careful calibration, error and context management, and we cannot forget that the agentic framework only works because the frontier AI labs specifically fine-tune their models in order to perfect tool-calling. Even according to the researchers of the RLM paper, the models would benefit a lot from being fine-tuned on the self-calling tool.
One other problem is that of coordination. LLM agents which are unaware of other agents' contexts and changes can often become counter-productive and can lead to an overall degradation of the work. This seems to be distributed systems problem, and there are certain very interesting implementations that try to tackle the coordination problem by using techniques from distributed systems theory [4].
The opposite of the distributed settings, would be a set-up I’d call the Stigmergical framework [5]. n short, each LLM/API call is given a task which modifies an environment. This allows for “communication“ of an agent’s action through the medium, and not by any direct means. This shifts the task of communication from the agent to the medium it’s working in. This method is truly beneficial when it comes to scale: a thousand agents communicating with each other is a combinatorial explosion, while a thousand agents—where each only communicates with or modifies a medium, and the medium carries the “information“—can coordinate much more effectively.
One structure, which seems very logical right now, would be a hierarchical structure, where a few Recursive Agents (LLM API plus self-call-tool plus Loop) can delegate certain tasks as sub-tasks to other Agents (with or without the self-call tool. Maybe the LLMs can decide for themselves?), while coordinating with each other (through some means; here we can take some inspiration from distributed systems).
Recursive Agents coordinate, and delegate to other recursive agents or normal agents
IV.
With this, I have tried to exhaustively cover all the ways in which we can make LLM API calls. My assumption here is that as the models become smarter, and if the frontier AI labs think it’s worth fine-tuning the models for the self-call tool, the more complex a system we can build. Right now, the only problem I can foresee with the “swarm“ intelligence, or deep sub-agentic setups, is model confusion or deviation from the goals.
Though I do think if the models can effectively navigate a multi-agent setup (either with or without self-call), we could get to solutions with much more complex problem statements, with the type of emergent solutions we observe in human societies. Coordination at scale can solve truly complicated problems.
Thank for reading this far !
During implementation, however, we must not forget that the looping paradigm only works if we use the ReAct method: the model must plan everything out ahead in reasoning traces, before the execution begins.
In case the branching model and the recursive model appear similar, one difference among them is that the LLM API call in the former never gets access to a self-calling tool, and the subsequent calls are made by an outside orchestrator code. However, in case of RLMs, the LLM does get access to a self-calling tool.
I asked an LLM this question and to provide me with an example. One example that it thought could be was an algorithmic trading bot, which consumes market data tick-by-tick, and performs analysis on it.
A thread of me discussing this.
This is actually a well-known design pattern, called the BlackBoard pattern.