Deconfuse Yourself about Agency

[-]Vanessa Kosoy6yΩ5110

I think that the concept of "agency" (although maybe "intelligence" would be a better word?), in the context of AI alignment, implies the ability to learn the environment and exploit this knowledge towards a certain goal. The only way to pursue a goal effectively without learning is having hard-coded knowledge of the environment. But, where would this knowledge come from? For complex environments, it is only likely to come from learning algorithms upstream.

So, a rock is definitely not an agent since there is nothing it learns about its environment (I am not even sure what the input/output channels of a rock are supposed to be). Q-learning is an agent, but the resulting policy is not an agent in itself. Similarly, AlphaGo is a sort of agent when regarded together with the training loop (it can in principle learn to play different games), but not when disconnected from it. Evolution is an agent, even if not a very powerful one. An ant colony is probably a little agentic because it can learn something, although I'm not sure how much.

[-]VojtaKovarik6yΩ240

I am not even sure what the input/output channels of a rock are supposed to be

I guess you imagine that the input is the physical forces affecting the ball and the output is the forces the ball exerts on the environment. Obviously, this is very much not useful for anything. But it suddenly becomes non-trivial if you consider something like the billiard-ball computer (seems like a theoretical construct, not sure if anybody actually built one...but it seems like a relevant example anyway).

[-]VojtaKovarik6yΩ230

Yep, that totally makes sense.

Observations inspired by your comment: While this shouldn't necessarily be so, it seems the particular formulations make a lot of difference when it comes to exchanging ideas. If I read your comment without the

(although maybe "intelligence" would be a better word?)

bracket, I immediatelly go "aaa, this is so wrong!". And if I substitute "intelligent" for "agent", I totally agree with it. Not sure whether this is just me, or whether it generalizes to other people.

More specifically, I agree that from the different concepts in the vicinity of "agency", "the ability to learn the environment and exploit this knowledge towards a certain goal" seems to be particularly important to AI alignment. I think the word "agency" is perhaps not well suited for this particular concept, since it comes with so many other connotations. But "intelligence" seems quite right.

[-]abramdemski6yΩ360

You mention the distinction between agent-like architecture and agent-like behavior (which I find similar to my distinction between selection and control), but how does the concept of $A (Θ)$ -morphism account for this distinction? I have a sense that (formalized) versions of $A (Θ)$ -morphism are going to be more useful (or easier?) for the behavioral side, though it isn't really clear.

[-]VojtaKovarik6yΩ230

I have a sense that (formalized) versions of A(Θ)-morphism are going to be more useful (or easier?) for the behavioral side, though it isn't really clear.

I think $A (Θ)$ -morphisation is primarily useful for describing what we often mean when we say "agency". In particular, I view this as distinct from which concepts we should be thinking about in this space. (I think the promising candidates include learning that Vanessa points to in her comment, optimization, search, and the concepts in the second part of my post.)

However, I think it might also serve as a useful part of the language for describing (non) agent-like behavior. For example, we might want to SGD-morphise an ecoli bacteria independently of whether it actually implements some form of stochastic gradient descent w.r.t. the concentration of some chemicals in the environment.

You mention the distinction between agent-like architecture and agent-like behavior (which I find similar to my distinction between selection and control), but how does the concept of A(Θ)-morphism account for this distinction?

I think of agent-like architectures as something objective, or related to the territory. In contrast, agent-like behavior is something subjective, something in the map. Importantly, agent-like behavior, or the lack of it, of some $X$ is something that exists in the map of some entity $Y$ (where often $Y \neq X$ ).

The selection/control distinction seems related, but not quite similar to me. Am I missing something there?

[-]abramdemski6yΩ240

I think of agent-like architectures as something objective, or related to the territory. In contrast, agent-like behavior is something subjective, something in the map. Importantly, agent-like behavior, or the lack of it, of some X is something that exists in the map of some entity Y (where often Y≠X).

The selection/control distinction seems related, but not quite similar to me. Am I missing something there?

A(Θ)-morphism seems to me to involve both agent-like architecture and agent-like behavior, because it just talks about prediction generally. Mostly I was asking if you were trying to point it one way or the other (we could talk about prediction-of-internals exclusively, to point at structure, or prediction-of-external exclusively, to talk about behavior -- I was unsure whether you were trying to do one of those things).

Since you say that you are trying to formalize how we informally talk, rather than how we should, I guess you weren't trying to make A(Θ)-morphism get at this distinction at all, and were separately mentioning the distinction as one which should be made.

[-]VojtaKovarik6yΩ110

I agree with your summary :). The claim was that humans often predict behavior by assuming that something has a particular architecture.

(And some confusions about agency seem to appear precisely because of not making the architecture/behavior distinction.)

[-]Grue_Slinky6yΩ110

This question also has a negative answer, as witnessed by the example of an ant colony --- agent-like behavior without agent-like architecture, produced by a "non-agenty" optimization process of evolution. Nonetheless, a general version of the question remains: If some X exhibits agent-like behavior, does it follow that there exists some interesting physical structure causally upstream of X?

Neat example! But for my part, I'm confused about this last sentence, even after reading the footnote:

An example of such "interesting physical structure" would be an implementation of an optimization architecture.

For one thing, I'm not sure I have much intuition about what is meant by "optimization architecture". For instance, I would not know how to begin answering the question:

Does optimization behavior imply optimization architecture?

And I have even less of a clue what is intended by "interesting physical structure" (perhaps facetiously, any process that causes agent-like behavior to arise sounds "interesting" for that reason alone).

In your ant colony example, is evolution the "interesting physical structure", and if so, how is it a physical structure?

[-]VojtaKovarik6yΩ110

First off, while I feel somewhat de-confused about X-like behavior, I don't feel very confident about X-like architectures. Maybe the meaning is somewhat clear on higher levels of abstraction (e.g., if my brain goes "realize I want to describe a concept --> visualize several explanations and judge each for suitability --> pick the one that seems the best --> send a signal to start typing it down", then this would be a kind of search/optimization-thingy). But on the level of physics, I don't really know what an architecture means. So take this with a grain of salt.

Maybe the term "physical structure" is misleading. The thing I was trying to point at is the distinction between being able to accurately model Y using model X, and Y actually being X. In the sense that there might be a giant look-up table (GLUT) that accuractly predicts your behavior, but on no level of abstraction is it correct to say that you actually are a GLUT. Whereas modelling you as having some goals, planning, etc. might be less accurate but somewhat more, hm, true. I realize this isn't very precise, but I guess you can see what I mean.

That being said, I suppose that what I meant by "optimization architecture" is, for example, a stochastic gradient descent with the emphasis on "this is the input", "this is the part of the algorithm that does the calculation", and "this is the output". An "implementation of an optimization architecture" would be...well, the atoms of your computer that perform SGD, or maybe some simple bacteria that moves in the direction where the concentration of whatever-it-likes is the highest (not that anything I know would implement precisely SGD, but still).

Ad "interesting physical structure" behind the ant-colony: If by "evolution" we mean the atoms that the world is made of, as they changed over time until your ant colony emerged...then yeah, this is a physical structure causally upstream of the ant colony, and one that is responsible for the ant colony behaving the way it does. I wouldn't say it is interesting (to me, and w.r.t. the ant colony) though, since it is totally incomprehensible to me. (But maybe "interestingness" doesn't really make sense on the level of physics, and is only relevant in relation to our abstract world-models and their understanding.)

Finally, the ideal thing a "X-like behavior ==> Y-like architecture" theorem would cash out into is a criterion that you can actually check and say with certainty that the thing will not exhibit X-like behavior. (Whether this is reasonable to hope for is another matter.) So, even if all that I have written in this comment turns out to be nonsense, getting such criterion is what we are after :-).

You might also view multi-agent systems these as monolithic agents, but this view might often give you wrong intuitions. I am including this category as an example that -- intuitively -- doesn't belong to either of the "agent" and "not-agent" categories. ↩︎
By default, we do not assume that $A (Θ)$ -morphization of $X$ is useful in any way, or even the most useful among all instances of $A (Θ)$ . This goes against the intuition according to which we would pick some $A (θ)$ that is close to optimal (among $θ^{'} \in Θ$ ) for predicting $X$ . I am currently unsure how to formalize this intuition, apart from requiring that is optimal (which seems too strong a condition). ↩︎
Distinguishing between "small enough" and "too big" prediction errors seems non-trivial since different environments are naturally more difficult to predict than others. Formalizing this will likely require additional insights. ↩︎
An example of such "interesting physical structure" would be an implementation of an optimization architecture. ↩︎
Even if true, this conjecture will likely require some additional assumptions. Moreover, I expect "randomly-generated look-up tables that happen to stumble upon AGI by chance" to serve as a particularly relevant counterexample. ↩︎
Whatever that means in this case. ↩︎

LESSWRONG
LW

LESSWRONG
LW

15

Deconfuse Yourself about Agency

15

Ω 9

15

Ω 9

Antropomorphization, but with architectures that aren't humans

Architectures

Generalizing anthropomorphization

When do we call things agents?

If we ban the concept of agency, which interesting problems remain?

Optimizer?

Agent-like behavior vs agent-like architecture

Moral standing

PC or NPC?

Creativity, unbounded goals, environment-generality