Why You Should Care About Goal-Directedness

A few other ways in which goal-directedness intersects with abstraction:

abstraction as an instrumentally convergent tool: to the extent that computation is limited but the universe is local, we'd expect abstraction to be used internally by optimizers of many different goals.
instrumental convergence to specific abstract models: the specific abstract model used should be relatively insensitive to variation in the goal.
type signature of the goal: to the extent that humans are goal-directed, our goals involve high-level objects (like cars or trees), not individual atoms.
embedded agency = abstraction + generality + goal-directedness. Roughly speaking, an embedded agent is a low-level system which abstracts into a goal-directed system, and that goal-directed system can operate across a wide range of environments requiring different behaviors.

what can be thrown out of the perfect model to get a simpler non-self-referential model (an abstraction) that is useful for a specific purpose?

Kind of tangential, but it's actually the other way around. The low-level world is "non-self-referential"; the universe itself is just one big causal DAG. In order to get a compact representation of it (i.e. a small enough representation to fit in our heads, which are themselves inside the low-level world), we sometimes throw away information in a way which leaves a simpler "self-referential" abstract model. This is a big part of how I think about agenty things in a non-agenty underlying world.

[-]adamShimi5yΩ110

Thanks for the additional ideas! I especially concur about the type signature of goals and the instrumental convergence to abstract models.

Kind of tangential, but it's actually the other way around. The low-level world is "non-self-referential"; the universe itself is just one big causal DAG. In order to get a compact representation of it (i.e. a small enough representation to fit in our heads, which are themselves inside the low-level world), we sometimes throw away information in a way which leaves a simpler "self-referential" abstract model. This is a big part of how I think about agenty things in a non-agenty underlying world.

But there's a difference between the low-level world and a perfect model of the low-level world embedded inside the world, isn't it? Also, I don't see how the compact representation is self-referential. If you mean that they can be embedded into the world, that's not what I meant.

[-]johnswentworth5yΩ240

I'm not quite clear on what you're asking, so I'll say some things which sound relevant.

I'm embedded in the world, so my world model needs to contain a model of me, which means my world model needs to contain a copy of itself. That's the sense in which my own world model is self-referential.

Practically speaking, this basically means taking the tricks from Writing Causal Models Like We Write Programs, and then writing the causal-model-version of a quine. It's relatively straightforward; the main consequence is that the model is necessarily lazily evaluated (since I'm "too small" to expand the whole thing), and then the interesting question is which queries to the model I can actually answer (even in principle) and how fast I can answer them.

In particular, based on how game theory works, there's probably a whole class of optimization queries which can be efficiently answered in-principle within this self-embedded model, but it's unclear exactly how to set them up so that the algorithm is both correct and always halts.

My world model is necessarily "high-level" in the sense that I don't have direct access to all the low-level physics of the real world; I expect that the real world (approximately) abstracts into my model, at least within the regimes I've encountered. I probably also have multiple levels of abstraction within my world model, in order to quickly answer a broad range of queries.

Did that answer the question? If not, can you give an example or two to illustrate what you mean by self-reference?

[-]adamShimi5yΩ110

Thanks a lot! I think my misunderstanding came from collapsing the computational complexity issues of self-referential simulation (expanding the model costs too much, as you mention) and the pure mathematical issue of defining such a model. In the latter sense, you can definitely have a self referential embedded model.

I'm embedded in the world, so my world model needs to contain a model of me, which means my world model needs to contain a copy of itself. That's the sense in which my own world model is self-referential.

I'm not sure why the last "need" is true. Is it because we're assuming my world model is good/useful? Because I can imagine a world model where I'm a black box, and so I don't need to model my own world model.

[-]johnswentworth5yΩ120

In theory I could treat myself as a black box, though even then I'm going to need at least a functional self model (i.e. model of what outputs yield what inputs) in order to get predictions out of the model for anything in my future light cone.

But usually I do assume that we want a "complete" world model, in the sense that we're not ignoring any parts by fiat. We can be uncertain about what my internal structure looks like, but that still leaves us open to update if e.g. we see some FMRI data. What I don't want is to see some FMRI data and then go "well, can't do anything with that, because this here black box is off-limits". When that data comes in, I want to be able to update on it somehow.

[-]Vladimir_Nesov5yΩ-2-10

Trouble comes from self-reference: since the agent is part of the world, so is its model, and thus a perfect model would need to represent itself, and this representation would need to represent itself, ad infinitum. So the model cannot be exact.

???

[-]adamShimi5yΩ560

What's the issue?

[-]RHollerith5y*100

One issue would be that it appears that the same argument can be used to argue for the troublesomeness of cyclic graphs.

Consider a graph that is mostly a tree, but one directed edge points to the root. What is the difference that makes your argument inapplicable to the graph, but applicable to a model of reality that contains a model of the model?

[-]adamShimi5y10

Thanks for trying to find a concrete example! Yet to be honest, I don't get yours. I don't see either a model or a world here. It seems that you consider the cyclic graph as a model of the unrolled graph, but there is no agent embedded in a world here.

Either way, I provided an explanation of what I really meant in this comment, which might solve the issue you're seeing.

[-]Vladimir_Nesov5yΩ240

The quote sounds like an argument for non-existence of quines or of the context in which things like the diagonalization lemma are formulated. I think it obviously sounds like this, so raising nonspecific concern in my comment above should've been enough to draw attention to this issue. It's also not a problem Agent Foundations explores, but it's presented as such. Given your background and effort put into the post this interpretation of the quote seems unlikely (which is why I didn't initially clarify, to give you the first move). So I'm confused. Everything is confusing here, including your comment above not taking the cue, positive voting on it, and negative voting on my comment. Maybe the intended meanings of "model" and "being exact" and "representation" are such that the argument makes sense and becomes related to Agent Foundations?

[-]adamShimi5yΩ110

I do appreciate you pointing out this issue, and giving me the benefit of the doubt. That being said, I prefer that comments clarify the issue raised, if only so that I'm more sure of my interpretation. The up and downvotes in this thread are I think representative of this preference (not that I downvoted your post -- I was glad for feedback).

About the quote itself, rereading it and rereading Embedded Agency, I think you're right about what I write not being an Agents Foundation problem (at least not one I know of). What I had in mind was more about non-realizability and self-reference in the context of decision/game theory. I seem to have mixed the two with naive Gödelian self-reference in my head at the time of writing, which resulted in this quote.

Do you think that this proposed change solves your issues?

"This has many ramifications, including non-realizability (the impossibility of the agent to contain an exact model of the world, because it is inside the world and thus smaller), self-referential issues in the context of game theory (because the model is part of the agent which is part of the world, other agents can access it and exploit it), and the need to find an agent/world boundary (as it's not given for free like in the dualistic perspective)."

[-]Vladimir_Nesov5y*Ω120

Having an exact model of the world that contains the agent doesn't require any explicit self-references or references to the agent. For example, if there are two programs whose behavior is equivalent, A and A', and the agent correctly thinks of itself as A, then it can also know the world to be a program W(A') with some subexpressions A', but without subexpression A. To see the consequences of its actions in this world, it would be useful for the agent to figure out that A is equivalent to A', but it is not necessary that this is known to the agent from the start, so any self-reference in this setting is implicit. Also, A' can't have W(A') as a subexpression, for reasons that do admit an explanation given in the quote that started this thread, but at the same time A can have W(A') as a subexpression. What is smaller here, the world or the agent?

(What's naive Gödelian self-reference? I don't recall this term, and googling didn't help.)

Dealing with self-reference in definitions of agents and worlds does not require (or even particularly recommend) non-realizability. I don't think it's an issue specific to embedded agents, probably all puzzles that fall within this scope can be studied while requiring the world to be a finite program. It might be a good idea to look for other settings, but it's not forced by the problem statement.

non-realizability (the impossibility of the agent to contain an exact model of the world, because it is inside the world and thus smaller)

Being inside the world does not make it impossible for the agent to contain the exact model of the world, does not require non-realizability in its reasoning about the world. This is the same error as in the original quote. In what way are quines not an intuitive counterexample to this reasoning? Specifically, the error is in saying "and thus smaller". What does "smaller" mean, and how does being a part interact with it? Parts are not necessarily smaller than the whole, they can well be larger. Exact descriptions of worlds and agents are not just finite expressions, they are at least equivalence classes of expressions that behave in the same way, and elements of those equivalence classes can have vastly different syntactic size.

(Of course in some settings there are reasons for non-realizability to be necessary or to not be a problem.)

[-]adamShimi5yΩ110

Thanks for additional explanations.

That being said, I'm not an expert on Embedded Agency, and that's definitely not the point of this post, so just writing stuff that are explicitly said in the corresponding sequence is good enough for my purpose. Notably, the section on Embedded World Models from Embedded Agency begins with:

One difficulty is that, since the agent is part of the environment, modeling the environment in every detail would require the agent to model itself in every detail, which would require the agent’s self-model to be as “big” as the whole agent. An agent can’t fit inside its own head.

Maybe that's not correct/exact/the right perspective on the question. But once again, I'm literally giving a two sentence explanations of what the approach says, not the ground truth or a detailed investigation of the subject.

[-]Vladimir_Nesov5y*Ω470

Yeah, that was sloppy of the article. In context, the quote makes a bit of sense, and the qualifier "in every detail" does useful work (though I don't see how to make the argument clear just by defining what these words mean), but without context it's invalid.

[-]adamShimi5yΩ110

Sorry for my last comment, it was more a knee-jerk reaction than a rational conclusion.

My issue here is that I'm still not sure of what would be a good replacement for the above quote, that still keeps intact the value of having compressed representations of systems following goals. Do you have an idea?

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

38

Why You Should Care About Goal-Directedness

38

Ω 19

38

Ω 19

Introduction

Meaning of Deconfusion

Reasons To Care

Overseeing

Additional Structure on Utility/Reward Functions

Natural Mathematical Abstraction

Conclusion