## LESSWRONGLW

I think CM with a logical coin is not well-defined. Say Omega determines whether or not the millionth digit of pi is even. If it's even, you verify this and then Omega asks you to pay $1000; if it's odd Omega gives you$1000000 iff. you would have paid Omega had the millionth digit of pi been even. But the counterfactual "would you have paid Omega had the millionth digit of pi been even and you verified this" is undefined if the digit is in fact odd, since you would have realized that it is odd during verification. If you don't actually verify it, then the problem is well-defined because Omega can just lie to you. I guess you could ask the counterfactual "what if your digit verification procedure malfunctioned and said the digit was even", but now we're getting into doubting your own mental faculties.

Perhaps I am missing the obvious, but why is this a hard problem? So our protagonist AI has some algorithm to determine if the millionth digit of pi is odd- he cannot run it yet, but he has it. Lets call that function f{}, that returns a 1 if the digit is odd, or a 0 if it is even. He also has some other function like: sub pay_or_no { if (f{}) { pay(1000); }

In this fashion, Omega can verify the algorithm that returns the millionth digit of pi, independently verify the algorithm that pays based on that return, and our protagonist gets his money.

1cousin_it8yGood point, thanks. You're right that even-world looks just as impossible from odd-world's POV as odd-world looks from even-world, so Omega also needs to compute impossible counterfactuals when deciding whether to give you the million. The challenge of solving the problem now looks very similar to the challenge of formulating the problem in the first place :-)

# 12

(This post doesn't require much math. It's very speculative and probably confused.)

Wei Dai came up with a problem that seems equivalent to a variant of Counterfactual Mugging with some added twists:

• the coinflip is "logical", e.g. the parity of the millionth digit of pi;
• after you receive the offer, you will have enough resources to calculate the coinflip's outcome yourself;
• but you need to figure out the correct decision algorithm ahead of time, when you don't have these resources and are still uncertain about the coinflip's outcome.

If you give 50/50 chances now to the millionth digit of pi being even or odd, you probably want to write the decision algorithm so it agrees to pay up later even when faced with a proof that the millionth digit of pi is even. But from the decision algorithm's point of view, the situation looks more like being asked to pay up because 2+2=4. How do we resolve this tension?

One of the main selling points of TDT-style decision theories is eliminating the need for precommitment. You're supposed to always do what you would have precommitted to doing, even if it doesn't seem like a very good idea after you've done your Bayesian updates. UDT solves Counterfactual Mugging and similar problems by being "updateless", so you keep caring about possible worlds in accordance with their apriori probabilities regardless of which world you end up in.

If we take the above problem at face value, it seems to tell us that UDT should treat logical uncertainty updatelessly too, and keep caring about logically impossible worlds in accordance with their apriori logical probabilities. It seems to hint that UDT should be coded from the start with a "logical prior" over mathematical statements, which encodes the creator's arbitrary "logical degrees of caring", just like its regular prior encodes the creator's arbitrary degrees of caring over physics. Then the AI must keep following that prior forever after. But that's a very tall order. Should you really keep caring about logically impossible worlds where 2+2=5, and accept bargains that help copies of you in such worlds, even after you calculate that 2+2=4?

That conclusion is pretty startling, but consider what happens if you reject it:

1. Precommitment can be modeled as a decision problem where an AI is asked to write a successor AI.
2. Imagine the AI is asked to write a program P that will be faced with Counterfactual Mugging with a logical coin. The AI doesn't have enough resources to calculate the coin's outcome, but P will have as much computing power as needed. The resulting utility goes to the AI.
3. Writing P is equivalent to supplying one bit: should P pay up if asked?
4. Supplying that bit is equivalent to accepting or refusing the bet "win $10000 if the millionth digit of pi is odd, lose$100 if it's even".

So if your AI treats logical uncertainty similarly enough to probabilities that it can make bets on digits of pi, reflective consistency seems to force it to have an unchanging "logical prior", and keep paying up in Counterfactual Mugging even when the logical coinflip looks as obvious to the AI as 2+2=4. Is there any way to escape this conclusion? (Nesov has an idea, but I can't parse it yet.) And what could a formalization of "logical priors" possibly look like?