I see a circularity problem in how folk talk about "agents". I doubt I'm the first to notice this problem. So I wonder what the standard reductionist materialist answer is.

The puzzle is in the title.

To add a few more words:

I have no problem with things like "rock" even though there are no rocks in the absolute universe. We're using many layers of abstraction. We can in principle break down (even literally!) what a rock is and go as far to the root of quantum math whatever as we like.

(It's unclear to me what the floor is that we reduce things to in reductionist materialism. But I'm okay handwaving that for now by saying "something something math-and-physics something something".)

Something like "The Odyssey" is trickier, but not too much so. It just requires that we add in some stuff about how brains work. Nothing too mysterious there. Unknown, sure, but not fundamentally mysterious.

But this seems to get very weird once you start talking about agents.

At first blush it looks the same as "The Odyssey". An agent is just an abstraction, right? Implemented by some unknown but fundamentally non-mysterious process in a brain.

But this is circular. An abstraction for whom? What even is an abstraction, when you're in the process of defining an agent? Is there some agent-free definition of an abstraction implicitly being invoked here?

I keep seeing people bump into this circularity when talking about AGI as an agent, and what alignment is. It's like folks' native intuitions assume that both quarks and agents are ontologically fundamental, but when pushed on this point they insist that really only the quarks are real… without anything that looks even vaguely to me like a justification for how you'd construct an agent out of quarks, or what that would mean.

In most spaces I'd assume this is because people just didn't finish thinking this through.

I'm guessing that someone somewhere has noticed this and has given this careful thought. And maybe it's even part of the core LW philosophy and I somehow missed it.

So I'm asking:

What is an agent in reductionist materialism?


New Answer
New Comment

6 Answers sorted by

My impression is that a ton of work at MIRI (and some related research lines in other places) went into answering this question, and indeed, no one knows the answer very crisply right now and yup that's alarming. 

See John Wentworth's post on Why Agent Foundations? An Overly Abstract Explanation, which discusses the need to find the True Name of agents.

(Also, while I agree agents are "more mysterious than rocks or The Odyssey", I'm actually confused why the circularity is particularly the problem here. Why doesn't the Odyssey also run into the Abstraction for Whom problem?)

(Also, while I agree agents are "more mysterious than rocks or The Odyssey", I'm actually confused why the circularity is particularly the problem here. Why doesn't the Odyssey also run into the Abstraction for Whom problem?)

Oh, I think it does, actually. It's just less immediate or central. Like, it's easy for me to imagine putting a copy of The Odyssey on a computer. It's damn near impossible for me to describe what putting "an agent" on my computer is, as opposed to some other kind of program. I was just trying to point at the center of the problem is all, and set aside the usual layers-of-abstraction "explanation" I'm used to hearing for this.

For low-bar definitions of "agent", you're probably running some already. "In computer science, a software agent is a computer program that acts for a user or other program in a relationship of agency, which derives from the Latin agere (to do): an agreement to act on one's behalf. Such "action on behalf of" implies the authority to decide which, if any, action is appropriate. Agents are colloquially known as bots, from robot." So, if you find agents weird and mysterious, you are probably using a different definition from Wikipedia's article on software agents.

I usually think of this in terms of Dennett's concept of the intentional stance, according to which there is no fact of the matter of whether something is an agent or not. But there is a fact of the matter of whether we can usefully predict its behavior by modeling it as if it was an agent with some set of beliefs and goals.

For example, even though the calculations of a chess-playing computer have practically nothing in common with human thought, its moves can still be effectively predicted by assuming that it “wants” to win at chess and “knows” the rules of chess. This gives rise to the prediction that it will always choose, from the list of viable moves, one which best furthers the goal of winning the game. Even though the best move may not be obvious, adopting the intentional stance still allows the human observer to improve on their predictions of what the computer would do, by eliminating obvious bad moves.

There is no observer-independent “fact of the matter” of whether a system is or is not an “agent”. However, there is an objective fact of the matter about how well-modeled a particular system’s behavior is modeled by the intentional stance, from the point of view of a given observer. There are, objectively, patterns in the observable behavior of an intentional system that correspond to what we call “beliefs” and “desires”, and these patterns explain or predict the behavior of the system unusually well (but not perfectly) for how simple they are. [...]

There are several approaches one might take to predicting the future behavior of some system; Dennett compares three: the physical stance, the design stance, and the intentional stance.

In adopting the physical stance towards a system, you utilize an understanding of the laws of physics to predict a system’s behavior from its physical constitution and its physical interactions with its environment. One simple example of a situation where the physical stance is most useful is in predicting the trajectory of a rock sliding down a slope; one would be able to get very precise and accurate predictions with knowledge of the laws of motion, gravitation, friction, etc. In principle (and presuming physicalism), this stance is capable of predicting in full the behavior of everything from quantum mechanical systems to human beings to the entire future of the whole universe.

With the design stance, by contrast, “one ignores the actual (possibly messy) details of the physical constitution of an object, and, on the assumption that it has a certain design, predicts that it will behave as it is designed to behave under various circumstances.” For example, humans almost never consider what their computers are doing on a physical level, unless something has gone wrong; by default, we operate on the level of a user interface, which was designed in order to abstract away messy details that would otherwise hamper our ability to interact with the systems.

Finally, there’s the intentional stance:

Here is how it works: first you decide to treat the object whose behavior is to be predicted as a rational agent; then you figure out what beliefs that agent ought to have, given its place in the world and its purpose. Then you figure out what desires it ought to have, on the same considerations, and finally you predict that this rational agent will act to further its goals in the light of its beliefs. A little practical reasoning from the chosen set of beliefs and desires will in many—but not all—instances yield a decision about what the agent ought to do; that is what you predict the agent will do.

Before further unpacking the intentional stance, one helpful analogy might be that the three stances can be understood as providing gears-level models for the system under consideration, at different levels of abstraction.[2] For purposes of illustration, imagine we want to model the behavior of a housekeeping robot:

  • The physical stance gives us a gears-level model where the gears are the literal gears (or other physical components) of the robot.
  • The design stance gives us a gears-level model where the gears come from the level of abstraction at which the system was designed. The gears could be e.g. the CPU, memory, etc., on the hardware side, or on the level of the robot’s user interface, on the software side.
  • The intentional stance gives us a gears-level model where the relevant gears are the robot’s beliefs, desires, goals, etc. [...]

Now that he’s described how we attribute beliefs and desires to systems that seem to us to have intentions of one kind or another, “the next task would seem to be distinguishing those intentional systems that really have beliefs and desires from those we may find it handy to treat as if they had beliefs and desires.” (For example, although a thermostat’s behavior can be understood under the intentional stance, most people intuitively feel that a thermostat doesn’t “really” have beliefs.) This, however, cautions Dennett, would be a mistake.

As a thought experiment, Dennett asks us to imagine that some superintelligent Martians descend upon us; to them, we’re as simple as thermostats are to us. If they were capable of predicting the activities of human society on a microphysical level, without ever treating any of us as intentional systems, it seems fair to say that we wouldn’t “really” be believers, to them. This shows that intentionality is somewhat observer-relative—whether or not a system has intentions depends on the modeling capabilities of the observer.

However, this is not to say that intentionality is completely subjective, far from it—there are objective patterns in the observables corresponding to what we call “beliefs” and “desires.” (Although Dennett is careful to emphasize that these patterns don’t allow one to perfectly predict behavior; it’s that they predict the data unusually well for how simple they are. For one, your ability to model an intentional system will fail under certain kinds of distributional shifts; analogously, understanding a computer under the design stance does not allow one to make accurate predictions about what it will do when submerged in liquid helium.) [...]

If something appears agent-y to us (i.e., we intuitively use the intentional strategy to describe its behavior), our next question tends to be, “but is it really an agent?” (It’s unclear what exactly is meant by this question in general, but it might be interpreted as asking whether some parts of the system correspond to explicit representations of beliefs and/or desires.) In the context of AI safety, we often talk about whether or not the systems we build “will or won’t be agents,” whether or not we should build agents, etc.

One of Dennett’s key messages with the intentional stance is that this is a fundamentally confused question. What it really and truly means for a system to “be an agent” is that its behavior is reliably predictable by the intentional strategy; all questions of internal cognitive or mechanistic implementation of such behavior are secondary. (Put crudely, if it looks to us like an agent, and we don’t have an equally-good-or-better alternative for understanding that system’s behavior, well, then it is one.) In fact, once you have perfectly understood the internal functional mechanics of a system that externally appears to be an agent (i.e. you can predict its behavior more accurately than with the intentional stance, albeit with much more information), that system stops looking like “an agent,” for all intents and purposes. (At least, modeling the system as such becomes only one potential model for understanding the system’s behavior, which you might still use in certain contexts e.g. for efficient inference or real-time action.)

We should therefore be more careful to recognize that the extent to which AIs will “really be agents” is just the extent to which our best model of their behavior is of them having beliefs, desires, goals, etc. If GPT-N appears really agent-y with the right prompting, and we can’t understand this behavior under the design stance (how it results from predicting the most likely continuation of the prompt, given a giant corpus of internet text) or a “mechanistic” stance (how individual neurons, small circuits, and/or larger functional modules interacted to produce the output), then GPT-N with that prompting really is an agent.

I find it difficult to believe that there can be no objective criteria for recognising agency when there are objective criteria for building agents.

If you are willing to countenance counterfactuals, it's possible to get more rigourous about "seems like an agent". A system is goal-driven if it would have displayed different circumstances to achieve the same goal, IE. It avoids obstacles. A system has a utility function if there is part of the system you can change to achieve different goals, in the preceding sense.

Before I got to the point in my education where I learned what the CPU has eaten it seemed that software programming languages had a ladder of more abstract and more concredte languages but it seemed it was just an issue of translating one language to the other. The primitive "takes orders" capacity seemed mysterious how it could ever appear or be explained in the hierachy. The beauty of learning what a primitive computer was like is in that none of the parts "take orders", its the software that is done entirely in hardware.

But processors are extrenally driven. For agents I suspect the core property is auto-poesis ie being run from signals emerging from within. Circuits will do some computation when excited but then "sleep" if the enviornemnt is not actively pushing in. Computers can keep up the excitation but will do essentially the same pattern unless disturbed from outside. Agents are the things that keep on changing their pattern even if the environment leaves them alone (or their evolution is because of the echo they make into the environment).

There is a sense in which agency is a fundamental concept. Before we can talk about physics, we need to talk about metaphysics (what is a "theory of physics"? how do we know which theories are true and which are false?). My best guess theory of metaphysics is infra-Bayesian physicalism (IBP), where agency is a central pillar: we need to talk about hypotheses of the agent, and counterfactual policies of the agent. It also looks like epistemic rationality is inseparable from instrumental rational: it's impossible to do metaphysics without also doing decision theory.

Does this refute reductionist materialism? Well, it depends how you define "reductionist materialism". There is a sense in which IBP is very harmonious with reductionist materialism, because each hypothesis talks about the universe from a "bird's eye view", without referring to the relationship of the agent with the universe (this relationship turns out to be possible to infer using the agent's knowledge of its own source code), or even assuming any agent exists inside the universe described by the hypothesis. But, the agent is still implicit in the "whose hypothesis".

Once we accept the "viewpoint agent" (i.e. the agent who hypothesizes/infers/decides) as fundamental, we can still ask, what about other agents? The answer is: other agents are programs with high value of (see Definition 1.6 in the IBP article) which the universe is "running" (this is a well-defined thing in IBP). In this sense, other agents are sort of like rocks: emergent from the fundamental reductionist description of the universe. However, there's a nuance: this reductionist description of the universe is a belief of the viewpoint agent. The fact it is a belief (formalized as a homogeneous ultradistribution) is crucial in the definition. So, once again, we cannot eliminate agency from the picture.

The silver lining is that, even though the concept of which programs are running is defined using beliefs, i.e. requires a subjective ontology, it seems likely different agents inhabiting the same universe can agree on it (see subsection "are manifest facts objective" in the IBP article), so there is a sense in which it is objective after all. Decide for yourself whether to call this "reductionist materialism".

Since you switched the moderation to "easy-going"...

I have hinted at a definition in an old post https://www.lesswrong.com/posts/NptifNqFw4wT4MuY8/agency-is-bugs-and-uncertainty. Basically we use agency as a black-box description of something.

 Of course, as generally agreed, agency is a convenient intentional stance model. There is no agency in a physical gears-level description of a system. 

But this is circular. An abstraction for whom? What even is an abstraction, when you're in the process of defining an agent? Is there some agent-free definition of an abstraction implicitly being invoked here?

To build it up from the first principles, we must start with a compressible (not fully random) universe, at a minimum, because "embedded agents", whatever they might turn out to be, are defined by having a somewhat accurate (i.e. lossily compressed) internal model of the world, so some degree of compressibility is required. (Though maybe useful lossy compression of a random stream is a thing, I don't know.)

Next, one would identify some persistent features of the world that look like they convert free energy into entropy (note that a lot of "natural" systems behave like that, say, stars).

Finally, merging the two, a feature of the world that contains what appears to be a miniature model of the (relevant part of the) world, which also converts energy into entropy to persist the model and "itself" would be sort of close to an "agent". 

There are plenty of holes in this outline, but at least there is no circularity, as far as I can tell.

One possible definition is to look for things which are more optimized than any simple mechanism you can imagine for performing a task. So, e.g. Kasparov is great at playing chess, as an amateur you can verify this by noting that any plan you can come up with will tend to do worse than Kasparov's plans(with high probability). In some sense this is an observer-relative definition, but it can be made more objective by considering the minimally-complex program that can match a given level of performance on a task, parameterized by e.g. Levin complexity. See this comment.

"optimised agent" appears not to be a tautology.

5 comments, sorted by Click to highlight new comments since: Today at 12:38 AM

I don't have a good answer for the question (indeed I think agent is a pretty overloaded term, and in many cases underspecified), but I'm not sure I understand why you think there's a circularity with respect to the definition of an agent that isn't present for other abstractions. Abstractions do not require an agent as part of their definition (though individual abstractions might).

Well, with something like "hammer", there's implicitly an agent (the one recognizing the thing as a hammer), but the agent isn't part of the hammer. So there's no loopiness. You can (in principle) define the hammer abstraction and the agent separately, assuming you can somehow define the agent at all.

But when trying to define what an agent is… and then using the abstraction layers that agents use to make sense of reality to define it…


…what exactly are you doing, then?

Not to say this makes it impossible. But I've yet to see a definition of "agent" that even acknowledges this loopiness, let alone addresses it.

I still don't see how this is "loopy", except in the very weak sense that you may be using the things that you mention. Self-reflective yes, but not circular. It's just another example of noticing that you share some common features with many other entities in the world.

I don't think abstraction and ability to use definitions is any necessary part of being an agent. Insects are agents. Rocks aren't. There are some sorts of fuzzy boundary between and around, but those boundaries are nowhere near the uses of the word "agent" as applied to AGI and alignment. It does so happen that some highly complex types of agents can use definitions and have models of the world that are adaptable and highly structured, and the types of agents considered when talking about AGI and alignment are usually assumed to do so, but it doesn't seem necessary to the concept of agency itself.

Meta: I didn't mean to set "Reign of Terror" moderation guidelines for this. I've updated them to "Easy Going".

Not a reductionist materialist perspective perse, but one idea I find plausible is that 'agent' makes sense as a necessary separate descriptor and a different mode of analysis precisely because of the loopiness you get when you think about thinking, a property that makes talking about agents fundamentally different from talking about rocks or hammers, the Odyssey, or any other 'thing' that could in principle be described on the single level of 'material reality' if we wanted to

When I try to understand the material universe and its physical properties, the object-level mode of analysis functions as we've come to expect from science - I can make observations and discover patterns, make predictions and hypothesize universal laws. But what happens when that thing which does the hypothesizing encounters another thing that does the same? To comprehend is to be able to predict and control, therefore in this encounter for one agent to successfully describe the other as object is to reduce its agent properties relative to this first agent (think: a superintelligence that can model you flawlessly, and to which you are just another lever to be pushed).

Any agent can, in principle, be described as an object. But at the same time there must always be at least one agent which can not be described as object from any perspective, the one that can describe all others. Insofar as it can describe itself as object, this very capacity is its mastery over itself, its ability to transcend the very limitations it can describe on the object level. This is similar to how you still need to ascribe the ability to 'think about the world' to materialist reductionist philosophers for the philosophy to be comprehensible - if their acts are themselves understood solely as material phenomena, you're left with nothing. Even materialist metaphysics can't function without a subject.

Which is to say, I agree with your assessment. Saying "really, only the material-level is real" is a self-defeating position, and when we do talk about agents we always have to do so from a perspective. For the superintelligence I am merely material, whereas two humans can appear/present themselves as free, self-determining agents to each other. But I think there has to be another definition having to do with the capacity to reflect. A human is still an agent in-itself, though not for-itself, even if they're currently being totally manipulated by an AI - they retain their capacity for engaging in agent-like operations, while a rock won't qualify as a subject no matter how little we interfere with its development.

New to LessWrong?