In the first problem, the agent could commit to one-boxing (through the mechanism I described in the link) and then finish simulating the predictor afterwards. Then the predictor would still be able to simulate the agent until it commits to one-boxing, and then prove that the agent will one-box no matter what it computes after that.

The second version of the problem seems more likely to cause problems, but it might work for the agent to restrict itself to not using the information it pre-computed for the purposes of modeling the predictor (even though it ha

... (read more)

An approach to the Agent Simulates Predictor problem

by AlexMennen 1 min read9th Apr 2016No comments


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.