Would it be progress if one could figure out how to construct an embedded system that can have a complete model of a highly compressible world, such that the system can correctly generate a plan that when executed would put the world into a particular target state (more simplifying assumptions follow)?

Correct planning means not dropping an anvil on its head as part of its plan, and being able to generate a plan that will include any beneficial self-modifications, by being able to "reason over itself".

I am imagining a system that gets as its goal a target state of the world, that should be reached. The system generates a plan that when executed would reach the target. This plan is generated using a breath-first tree search.

I am making the following additional assumptions:

  • The world and the agent are both highly compressible. This means we can have a representation of the entire environment (including the agent) inside the agent, for some environments. We only concern ourselves with environments where this is the case.
  • To make tree search possible:
    • The environment is discrete.
    • You know the physics of the environment perfectly.
    • You know the current state of the world perfectly.
    • You can compute everything that takes finite compute and memory instantly. (This implies some sense of cartesianess, as I am sort of imagining the system running faster than the world, as it can just do an entire tree search in one "clock tick" of the environment.) 

With these assumptions does the initial paragraph seem trivial to achieve or would it be considered progress? 

My intuition is that this would still need to solve the problem of giving an agent a correct representation of itself, in the sense that it can "plan over itself" arbitrarily. This can be thought of as enabling the agent to reason over the entire environment which includes itself.  Is that part a solved problem?

It also seems like you can think about a lot of memory optimizations in this setting. For example, you can save only one world model, and mutate it to do the tree search, by only saving the deltas at each node. Then the system could do a significant tree search if it has a total amount of memory of 2x the amount of memory required to represent the world model, assuming deltas are generally much smaller than the world model.

It seems like once you have solved these things you could get a working embedded system that is as smart as possible, in the sense that it would find the shortest plan that would result in the target world state. 

This topic came up when working on a project where I try to make a set of minimal assumptions such that I know how to construct an aligned system under these assumptions. After knowing how to construct an aligned system under this set of assumptions, I then attempt to remove an assumption and adjust the system such that it is still aligned. I am trying to remove the cartesian assumption right now.

New to LessWrong?

New Answer
New Comment
2 comments, sorted by Click to highlight new comments since: Today at 1:51 PM

This topic came up when working on a project where I try to make a set of minimal assumptions such that I know how to construct an aligned system under these assumptions. After knowing how to construct an aligned system under this set of assumptions, I then attempt to remove an assumption and adjust the system such that it is still aligned. I am trying to remove the cartesian assumption right now.

I would encourage you to consider looking at Reflective Oracles next, to describe a computationally unbounded agent which is capable of thinking about worlds which are as computationally unbounded as itself; and a next logical step after that would be to look at logical induction or infrabayesianism, to think about agents which are smaller than what they reason about.

You can compute everything that takes finite compute and memory instantly. (This implies some sense of cartesianess, as I am sort of imagining the system running faster than the world, as it can just do an entire tree search in one "clock tick" of the environment.) 

This part makes me quite skeptical that the described result would constitute embedded agency at all. It's possible that you are describing a direction which would yield some kind of intellectual progress if pursued in the right way, but you are not describing a set of constraints such that I'd say a thing in this direction would definitely be progress.

My intuition is that this would still need to solve the problem of giving an agent a correct representation of itself, in the sense that it can "plan over itself" arbitrarily. This can be thought of as enabling the agent to reason over the entire environment which includes itself.  Is that part a solved problem?

This part seems inconsistent with the previous quoted paragraph; if the agent is able to reason about the world only because it can run faster than the world, then it sounds like it'll have trouble reasoning about itself. 

Reflective Oracles solve the problem of describing an agent with infinite computational resources which can do planning involving itself and other similar agents, including uncertainty (via reflective-oracle solomonoff induction), which sounds superior to the sort of direction you propose. However, they do not run "faster than the world", as they can reason about worlds which include things like themselves.