(This post doesn't require much math. It's very speculative and probably confused.)

Wei Dai came up with a problem that seems equivalent to a variant of Counterfactual Mugging with some added twists:

- the coinflip is "logical", e.g. the parity of the millionth digit of pi;
- after you receive the offer, you will have enough resources to calculate the coinflip's outcome yourself;
- but you need to figure out the correct decision algorithm ahead of time, when you don't have these resources and are still uncertain about the coinflip's outcome.

If you give 50/50 chances now to the millionth digit of pi being even or odd, you probably want to write the decision algorithm so it agrees to pay up later even when faced with a *proof* that the millionth digit of pi is even. But from the decision algorithm's point of view, the situation looks more like being asked to pay up because 2+2=4. How do we resolve this tension?

One of the main selling points of TDT-style decision theories is eliminating the need for precommitment. You're supposed to always do what you would have precommitted to doing, even if it doesn't seem like a very good idea after you've done your Bayesian updates. UDT solves Counterfactual Mugging and similar problems by being "updateless", so you keep caring about possible worlds in accordance with their apriori probabilities regardless of which world you end up in.

If we take the above problem at face value, it seems to tell us that UDT should treat logical uncertainty updatelessly too, and keep caring about *logically* impossible worlds in accordance with their apriori *logical* probabilities. It seems to hint that UDT should be coded from the start with a "logical prior" over mathematical statements, which encodes the creator's arbitrary "logical degrees of caring", just like its regular prior encodes the creator's arbitrary degrees of caring over physics. Then the AI must keep following that prior forever after. But that's a very tall order. Should you really keep caring about logically impossible worlds where 2+2=5, and accept bargains that help copies of you in such worlds, even after you calculate that 2+2=4?

That conclusion is pretty startling, but consider what happens if you reject it:

- Precommitment can be modeled as a decision problem where an AI is asked to write a successor AI.
- Imagine the AI is asked to write a program P that will be faced with Counterfactual Mugging with a logical coin. The AI doesn't have enough resources to calculate the coin's outcome, but P will have as much computing power as needed. The resulting utility goes to the AI.
- Writing P is equivalent to supplying one bit: should P pay up if asked?
- Supplying that bit is equivalent to accepting or refusing the bet "win $10000 if the millionth digit of pi is odd, lose $100 if it's even".

So if your AI treats logical uncertainty similarly enough to probabilities that it can make bets on digits of pi, reflective consistency seems to force it to have an unchanging "logical prior", and keep paying up in Counterfactual Mugging even when the logical coinflip looks as obvious to the AI as 2+2=4. Is there any way to escape this conclusion? (Nesov has an idea, but I can't parse it yet.) And what could a formalization of "logical priors" possibly look like?