Does Agent-like Behavior Imply Agent-like Architecture?

Scott Garrabrant

[ Question ]

Does Agent-like Behavior Imply Agent-like Architecture?

by Scott Garrabrant

2 min read23rd Aug 20192 answers 6 comments

58 Ω 32

AgencyGoal-DirectednessAI

Frontpage

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is not a well-specified question. I don't know what "agent-like behavior" or "agent-like architecture" should mean. Perhaps the question should be "Can you define the fuzzy terms such that 'Agent-like behavior implies agent-like architecture' is true, useful, and in the spirit of the original question." I mostly think the answer is no, but it seems like it would be really useful to know if true, and the process of trying to make this true might help us triangulate what we should mean by agent-like behavior and agent-like architecture.

Now I'll say some more to try to communicate the spirit of the original question. First a giant look-up table is not (directly) a counterexample. This is because it might be that the only way to produce an agent-like GLUT is to use agent-like architecture to search for it. Similarly a program that outputs all possible GLUTs is also not a counterexample because you might have to use your agent-like architecture to point at the specific counterexample. A longer version of the conjecture is "If you see a program implements agent-like behavior, there must some agent-like architecture in the program itself, in the causal history of the program, or in the process that brought your attention to the program." The pseudo-theorem I want is similar to the claim that correlation really does imply causation or the good regulator theorem.

One way of defining agent-like behavior as that which can only be produced by an agent-like architecture. This makes the theorem trivial, and the challenge is making the theorem non-vacuous. In this light, the question is something like "Is there some nonempty class of architectures that can reasonably be described as a subclass of 'agent-like' such that the class can be equivalently specified either functionally or syntactically?" This looks like it might conflict with the spirit of Rice's theorem, but I think making it probabilistic and referring to the entire causal history of the algorithm might give it a chance of working.

One possible way of defining agent-like architecture is something like "Has a world model and a goal, and searches over possible outputs to find one such that the model believes that output leads to the goal." Many words in this will have to be defined further. World model might be something that has high logical mutual information with the environment. It might be hard to define search generally enough to include everything that counts as search. There also might be completely different ways to define agent-like architecture. Do whatever makes the theorem true.

AgencyGoal-DirectednessAI

Frontpage

58 Ω 32

Mentioned in

146What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?

126AI Alignment 2018-19 Review