LESSWRONG
LW

Winter Cross — LessWrong

Robust Finite Policies are Nontrivially Structured

This post was created during the Dovetail Research Fellowship. Thanks to Alex, Alfred, everyone who read and commented on the draft, and everyone else in the fellowship for their ideas and discussions.

Overview

The proof detailed in this post was motivated by a desire to take a step towards solving the agent structure problem, which is the conjecture that a system which exhibits agent-like behavior must have agent-like structure. Our goal was to describe a scenario where something concrete about a policy's structure can be inferred from its robust behavior alone.

For this result, we model policies with deterministic finite automata and show that the automata of policies that meet certain robustness criteria must share... (read 3136 more words →)

Replying toThe Internal Model Principle: A Straightforward Explanation

Winter Cross5mo

The Internal Model Principle: A Straightforward Explanation

Thanks for the answer! That confirms what I was thinking.

That second case: and $(s_{1}, w_{3}) \to (s_{2}, w_{2})$ surprised me since I initially thought that the IMP implied that there was an additional isomorphism between the controller and the environment. I guess that isomorphism effectively still exists since it can be created through the use of coarse graining over the controller states like you mentioned.

Replying toThe Internal Model Principle: A Straightforward Explanation

Winter Cross5mo

The Internal Model Principle: A Straightforward Explanation

This article is really approachable for someone like me who’s just getting acquainted with mathematical AI safety research, so I appreciate that! This definitely helped me better understand the IMP.

I have a question about this part in the "How is the controller 'modelling' the environment?" section:

If the joint system is represented by environment-controller pairs , then $γ^{+}$ being injective means that no two pairs (within $X^{+}$ ) will have the same environment value $s$ or controller value $w$ . This means that with appropriate re-labelling, each joint state can be indexed:
$(s_{1}, w_{1}), (s_{2}, w_{2}), (s_{3}, w_{3}), . . . etc.$

I don't see how this follows. Couldn't the shape of the joint states be more complicated such as forming a cycle or having multiple joint states evolve to the same joint state? Both these possibilities would break the indexing. Is there something I'm missing that implies this linear structure?