I think you mean "a spurious counterfactual where the conditional utilities view the agent one-boxing as evidence that the predictor’s axioms must be inconsistent"? That is, the agent correctly believes that predictor's axioms are likely to be consistent but also thinks that they would be inconsistent if it one-boxed, so it two-boxes?

An approach to the Agent Simulates Predictor problem

by AlexMennen 1 min read9th Apr 2016No comments


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.