Alex Flint - LessWrong

Independent AI alignment researcher

The idea here is that world modelling (working out what the state of the world at the present moment is) and planning (working out what to do given the state of the world at the present moment) might be very tangled up with each other in the source code for some AI agents.

When we think of building agents that act in the world, it's common to imagine that they will first use the information available to them to create a model of the world, and then, given that, formulate a plan to achieve some kind of goal. That's one possible way to build agents, but John's post here actually attempts to say something about the space of all possible agents. While some agents may have nicely separated modelling/planning algorithms, it's not guaranteed that it will be that way at all, and the point of my comment here was to show that for any nicely separated agent, there is a not-nicely-separated agent that arrives at the same plan in the limit of time.

My argument here goes as follows: suppose that you have some agent that is nicely separated into world modelling and planning submodules/sub-algorithms. Then you can use the three dot points in my comment to construct source code for a new agent that does the same thing, but is not nicely separated. The point of this is to show that it cannot be that the best or most optimal agents are nicely separated, because for every nicely separated agent source code, there is an equally-good not-nice-separated agent source code.

Got it! Good to know.

Thanks! We were wondering about that. Is there any way we could be changed to the frontpage category?

A bomb would not be an optimizing system, because the target space is not small compared to the basin of attraction. An AI that systematically dismantles things would be an optimizing system if for no other reason than that the AI systematically preserves its own integrity.

It's worse, even, in a certain way, than that: the existence of optimizing systems organized around a certain idea of "natural class" feeds back into more observers observing data that is distributed according to this idea of "natural class", leading to more optimizing systems being built around that idea of "natural class", and so on.

Once a certain idea of "natural class" gains a foothold somewhere, observers will make real changes in the world that further suggest this particular idea of "natural class" to others, and this forms a feedback loop.

If you pin down what a thing refers to according to what that thing was optimized to refer to, then don't you have to look at the structure of the one who did the optimizing in order to work out what a given thing refers to? That is, to work out what the concept "thermodynamics" refers to, it may not be enough to look at the time evolution of the concept "thermodynamics" on its own, I may instead need to know something about the humans who were driving those changes, and the goals held within their minds. But, if this is correct, then doesn't it raise another kind of homunculus-like regression where we were trying to directly explain semantics, but we ended up needing to inquire into yet another mind, the complete understanding of which would require further unpacking of the frames and concepts held in that mind, and the complete understanding of those frames and concepts requiring even further inquiry into a yet earlier mind that was responsible for doing the optimization of those frames and concepts?

There seems to be some real wisdom in this post but given the length and title of the post, you haven't offered much of an exit -- you've just offered a single link to a youtube channel for a trauma healer. If what you say here is true, then this is a bit like offering an alcoholic friend the sum total of one text message containing a single link to the homepage of alcoholics anonymous -- better than nothing, but not worthy of the bombastic title of this post.

friends and family significantly express their concern for my well being

What exact concerns do they have?

Wow, thank you for this context!

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments

Sequences

Posts

Wikitag Contributions

Comments