For the simulation-output variant of ASP, let's say the agent's possible actions/outputs consist of all possible simulations Si (up to some specified length), concatenated with "one box" or "two boxes". To prove that any given action has utility greater than zero, the agent must prove that the associated simulation of the predictor is correct. Where does your algorithm have an opportunity to commit to one-boxing before completing the simulation, if it's not yet aware that any of its available actions has nonzero utility? (Or would that commitment require a

... (read more)

An approach to the Agent Simulates Predictor problem

by AlexMennen 1 min read9th Apr 2016No comments


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.