How I Stumbled Into a Pattern for Transparent AI Agents

jonnymac

Rejected for the following reason(s):

Insufficient Quality for AI Content.
No Basic LLM Case Studies.
The content is almost always very similar.
Usually, the user is incorrect about how novel/interesting their case study is (i.
Most of these situations seem like they are an instance of Parasitic AI.

Read full explanation

About a year ago, I was working on some software for a customer, and having GPT write the code for me. Back then, models had a limited context window and didn't yet have memories outside of that context window. I was grateful to have GPT write code for me, but I found it rather frustrating to give it the same details over and over. I jokingly suggested to GPT that I should "bolt a database" onto it, and GPT gave me quite an inspiring response. It said if there was anyone who could do such a thing it was an inquisitive person like myself. So, I got to work to do just that.

I started to think about how I could get GPT to tell me which "memories" it would like to store, and I quickly found myself wondering how I would get those memories back to GPT in the next prompt. I immediately built this system outside of a context window, because those were the tools that I had. I couldn't train my own model, that would require a massive investment, and I was trying to be frugal about the whole project. It took a few weeks for the ideas to gel, but I stumbled upon a system where GPT would tell me its intentions, which included specifications for when memories would be saved and retrieved, and how those memories would be used in the future model calls.

I named this system I#, or intentions #, a funny way of suggesting the intentions were something of a programming language that GPT could use to control itself outside of a context window. You see, the idea was the initial model call would detail the problem, and request from the model what intentions would be needed to solve the problem. Those intentions would be returned to my back-end service running on a Linux server. This service would then use the i# intentions to call other context windows and follow the instructions provided by the initial model call. I# has evolved into something of a programming language of its own, and the Linux service is akin to an I# interpreter. Here's a pseudo-code like example:

I was really excited when it all started to work just as I imagined, but I found a new problem. The original model call couldn't anticipate the exact steps required to carry out the task verbatim, and would need to have some sort of branching logic. I did some soul searching and wondered how I could get around this. I could use basic Boolean decision blocks in the i# code but at that point I'd just be using AI as a code generator, not as something that reasons at runtime. I didn't see that as an intelligent component. I wanted more, but once again, I didn't want to train my own model for decisions. I opened a fresh chat with a model, and asked it to give me a decision, based on a logic puzzle that I gave it. The model returned exactly what I wanted, a decision, and a reason why, and with that I knew I could use existing models as intelligent Boolean decisions. I asked the model to wrap the decision and response into a JSON text reply and it worked exceptionally well, returning either "true" or "false" exactly, and allowing the I# interpreter to use that response as a Boolean decision.

At some point the scope of the project crept from just storing memories to an entire instruction set. The original model call could generate its set of instructions for an Agent workflow, which included model calls specifically for making decisions. I started calling the original model call - the one that generates the full set of I# instructions - a 'blueprint'. I realized a few months later, that if there was a problem that the original blueprint could not address, it could embed an additional "blueprint" call into the i# to handle the issue.

For example, and this is really a trivial example: Suppose the original i# program was designed to generate a client email, and we start by using a model call decision gate to determine if all information is available to write the email.

If it has all of the information, it can write the email and the program ends.
If it doesn't have all of the information it needs, it can call a new "blueprint" request to generate the code needed to get the required information. In the example below, you can see the blue "blueprint" block as well as the i# code it generates to accomplish the task. That code can then be run immediately by the i# interpreter, gathering the required information needed to complete the task.

So, why is any of this important?

Well, I had great difficulty in the beginning trying to see what my programs were doing and why, and I stumbled upon a pattern for agent transparency. I found this out of another need, which was to debug the system I was building, but I soon realized how important it could be. If I could see the exact model call text, the models decision and the reason for the decision, it could make agents much more transparent. Every decision could be audited. Instead of having AI agents that are given broad control to complete a task with very little transparency, I could look at the log of what was requested and why.

The Transparency Insight

Models can make judgment calls on specific requests rather than jamming everything into a context window and hoping for the best. Each decision can be audited, and is completely transparent. Transparency is necessary, though not sufficient, for AI Safety. If we can peer into the black box and make it transparent, it will be easier for these systems to be trusted, because we can verify them.

Agents can generate instructions for themselves and use that as scaffolding for future actions, and those instructions can be viewed just as transparently as the decisions above. It's a big deal, really, self-generating code that can be run immediately and is also transparent for anyone to audit, or the agent can wait for a human to approve the new instruction set.

I believe these two components are the basis for how transparent agents can be made. Sure, each model call adds complexity in the back-end and latency to decisions. It's a small trade-off to sacrifice a little speed for complete transparency, and maybe there is a way to speed it up anyway.

A nice side effect of this method is that agents would no longer be constrained by context windows. With decision gates and blueprints, agents could run indefinitely in theory, so long as the i# interpreter is online, the agent theoretically could run for years. I should temper that expectation a bit, servers often get rebooted at least once a year, but given the right i# instruction set, the agent should be able to read past memories and continue where it left off.

This isn’t only theoretical. I’ve been working with this system for 8 months, and I’ve been building components on top of this framework that are even more promising, but those components will have to wait for another paper.

The question isn't whether agents should be autonomous — I'm sure it will happen soon. The question is whether we can trust them to be. I think transparency is how we get there.

If this resonates, I'd like to hear from you: jon (dot) macpherson (at) gmail (dot) com, or drop a comment below

1

How I Stumbled Into a Pattern for Transparent AI Agents

1

1

1