Typo: in the first full paragraph of page 2, I assume you mean the agent will one-box, not two-box.

And I'm not sure the final algorithm necessarily one-boxes even if the logical uncertainty engine thinks the predictor's (stronger) axioms are probably consistent- I think there might be a spurious counterfactual where the conditional utilities view the agent two-boxing as evidence that the predictor's axioms must be inconsistent. Is there a clean proof that the algorithm does the correct thing in this case?

An approach to the Agent Simulates Predictor problem

by AlexMennen 1 min read9th Apr 2016No comments


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.