Vague Thoughts and Questions about Agent Structures

by loriphos 1mo23rd Aug 20193 comments

9

Ω 5


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Epistemic status: Posting for blog day at MSFP! More trying to figure out what the right definitions are than saying anything concrete. I still don’t really know what agents are, and none of this is math yet. I’m hoping to develop these (and other) ideas more in the future, so any feedback is greatly appreciated.

My Naive Agent Pre-model

Y’know -- agents are, like, things that do things. They have utility functions and stuff. They make choices, whatever that means.

Unfortunately this ‘definition’ isn’t sufficient for making any concrete claims about how agents behave, so I’ve been thinking about some models that might be, and this post contains ideas that came out of that.

Irreducible Agents vs Agent Clusters

Irreducible Agents

An irreducible agent is what I’m calling something that optimizes a really simple utility function in some straightforward sense -- maybe it just does gradient descent or something. If it has a choice of two actions, it picks the one that results in higher utility every time. (This concept needs a precise definition, but I’m not sure what the right definition is yet, so I’m just trying to point at the thing).

It seems like when people talk about agents in the abstract, this is the kind of agent they often mean. But people also sometimes talk about things like humans as agents, and we aren’t really like that. Humans seem to be at least partially made up of smaller agenty parts that have different and sometimes conflicting goals -- more like what I’m calling ‘agent clusters’

Agent clusters

If you glue together a bunch of irreducible agents in a reasonable way, you could still get something that looks agenty. I can think of a couple of ways to think about gluing agents together; there are also probably better framings:

One way is by having a sort of meta-agent that turns agents on and off by some criteria, and the subagent that is turned on gets to decide what to do. I’m not sure this framing makes sense; if you can think of the meta-agent as having a utility function, it seems like it collapses to just be an irreducible agent after all. But maybe having the meta-agent use some simple rules that don’t constitute a true utility function could work to build agent clusters that efficiently approximate a more complex utility function than they could explicitly represent.

Another way is to think of the agents as voters which rank actions by their utility function and use some voting system to decide what the action of the resulting system will be. (I'm pretty sure Arrow’s Impossibility Theorem doesn’t kill this as a possible structure - it just says that you can’t do it with every set of agents, which is not surprising)

(Note that these two framings could be combined -- the meta-agent could activate multiple subagents at the same time and aggregate the opinions of only the activated subagents via some voting system. I’m not sure if this is useful)

Some questions:

Q: Are these two framings (the meta-agent framing and the electorate framing) equivalent? Are there other options or can any reasonable cluster be described in this way?

Q: Given a description of a (purported) agent as a cluster of irreducible agents in this way, what should it mean for it to be coherent and how can we tell whether it is coherent? One idea is that using the electorate model, ‘coherent’ could mean that it is possible to satisfy the desiderata from Arrow’s theorem.

Q: If you use the electorate model, are there simple conditions on the utility functions of the voters under which a coherent agent can be formed?

9

Ω 5