Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is not a well-specified question. I don't know what "agent-like behavior" or "agent-like architecture" should mean. Perhaps the question should be "Can you define the fuzzy terms such that 'Agent-like behavior implies agent-like architecture' is true, useful, and in the spirit of the original question." I mostly think the answer is no, but it seems like it would be really useful to know if true, and the process of trying to make this true might help us triangulate what we should mean by agent-like behavior and agent-like architecture.

Now I'll say some more to try to communicate the spirit of the original question. First a giant look-up table is not (directly) a counterexample. This is because it might be that the only way to produce an agent-like GLUT is to use agent-like architecture to search for it. Similarly a program that outputs all possible GLUTs is also not a counterexample because you might have to use your agent-like architecture to point at the specific counterexample. A longer version of the conjecture is "If you see a program implements agent-like behavior, there must some agent-like architecture in the program itself, in the causal history of the program, or in the process that brought your attention to the program." The pseudo-theorem I want is similar to the claim that correlation really does imply causation or the good regulator theorem.

One way of defining agent-like behavior as that which can only be produced by an agent-like architecture. This makes the theorem trivial, and the challenge is making the theorem non-vacuous. In this light, the question is something like "Is there some nonempty class of architectures that can reasonably be described as a subclass of 'agent-like' such that the class can be equivalently specified either functionally or syntactically?" This looks like it might conflict with the spirit of Rice's theorem, but I think making it probabilistic and referring to the entire causal history of the algorithm might give it a chance of working.

One possible way of defining agent-like architecture is something like "Has a world model and a goal, and searches over possible outputs to find one such that the model believes that output leads to the goal." Many words in this will have to be defined further. World model might be something that has high logical mutual information with the environment. It might be hard to define search generally enough to include everything that counts as search. There also might be completely different ways to define agent-like architecture. Do whatever makes the theorem true.

New Answer
New Comment

2 Answers sorted by

Conjecture: Every short proof of agentic behavior points out agentic architecture.

Let's say "agent-like behavior" is "taking actions that are more-likely-than-chance to create an a-priori-specifiable consequence" (this definition includes bacteria).

Then I'd say this requires "agent-like processes", involving (at least) all 4 of: (1) having access to some information about the world (at least the local environment), including in particular (2) how one's actions affect the world. This information can come either baked into the design (bacteria, giant lookup table), and/or from previous experience (RL), and/or via reasoning from input data. It also needs (3) an ability to use this information to choose actions that are likelier-than-chance to achieve the consequence in question (again, the outcome of this search process could be baked into the design like bacteria, or it could be calculated on-the-fly like human foresight), and of course (4) a tendency to actually execute those actions in question.

I feel like this is almost trivial, like I'm just restating the same thing in two different ways... I mean, if there's no mutual information between the agent and the world, its actions can only be effective only insofar as the exact same action would be effective when executed in a random location of a random universe. (Does contracting your own muscle count as "accomplishing something without any world knowledge"?)

Anyway, where I'm really skeptical here is in the term "architecture". "Architecture" in everyday usage usually implies software properties that are obvious parts of how a program is built, and probably put in on purpose. (Is there a more specific definition of "architecture" you had in mind?) I'm pretty doubtful that the ingredients 1-4 have to be part of the "architecture" in that sense. For example, I've been thinking a lot about self-supervised learning algorithms, which have ingredient (1) by design and have (3) sorta incidentally. The other two ingredients (2) and (4) are definitely not part of the "architecture" (in the sense above). But I've argued that they can both occur as unintended side-effects of its operation: See here, and also here for more details about (2). And thus I argue at that first link that this system can have agent-like behavior.

(And what's the "architecture" of a bacteria anyway? Not a rhetorical question.)

Sorry if this is all incorrect and/or not in the spirit of your question.

Does your definition of agentic behavior include the behavior of a star?

1Steven Byrnes4y
No, I would try to rule out stars based on "a-priori-specifiable consequence"—saying that "stars shine" would be painting a target around the arrow, i.e. reasoning about what the system would actually do and then saying that the system is going to do that. For example, expecting bacteria to take actions that maximize inclusive genetic fitness would certainly qualify as "a priori specifiable". The other part is "more likely than chance", which I suppose entails a range of possible actions/behaviors, with different actions/behaviors invoked in different possible universes, but leading to the same consequence regardless. (You can see how every step I make towards being specific here is also a step towards making my "theorem" completely trivial, X=X.)
4 comments, sorted by Click to highlight new comments since: Today at 1:58 AM

Consider the Sphex wasp, doing the same thing in response to the same stimulus. Would you say that this is not an agent, or would you say that it is part of an agent, and that extended agent did search in a "world model" instantiated in the parts of the world inhabited by ancestral wasps?

At this point, if you allow "world model" to be literally anything with mutual information including other macroscopic situations in the world, and "search" to be any process that gives you information about outcomes, then yes, I think you can guarantee that, probabilistically, getting a specific outcome requires information about that outcome (no free lunch), which implies "search" on a "world model." As for goals, we can just ignore the apparent goals of the Sphex wasp and define a "real" agent (evolution) to have a goal defined by whatever informative process was at work (survival).

I think I do want to make my agent-like architecture general enough to include evolution. However, there might be a spectrum of agent-like-ness such that you can't get much more than Sphex behavior with just evolution (without having a mesa-optimizer in there)

I think you can guarantee that, probabilistically, getting a specific outcome requires information about that outcome (no free lunch), which implies "search" on a "world model."

Yeah, but do you think you can make it feel more like a formal proof?

Would it perhaps be helpful to think of agent-like behavior as that which takes abstractions as inputs, rather than only raw physical inputs? e.g. an inanimate object such as a rock only interacts with the world on the level of matter, not on the level of abstraction. A rock is affected by wind currents according to the same laws, regardless of  the type of wind (breeze, tornado, hurricane), while an agent may take different actions or assume different states dependent on the abstractions the wind has been reduced to in its world model. 

This is because it might be that the only way to produce an agent-like GLUT is to use agent-like architecture to search for it.

Or to make a self-modifying lookup table. (I wonder if Google docs supports that.) There's also a theory that agents are effective, so a searches for good "plans" can produce/find them.

One possible way of defining agent-like architecture is

It stops when certain conditions have been "found", it "moves" otherwise.

is something like "Has a world model and a goal, and searches over possible outputs to find one such that

Does an SAT solver qualify?

the model believes that output leads to the goal."

Also, some distinguish between "believes the goal will be achieved via that output" and "verifies the goal is achieved".