(This post doesn't require much math. It's very speculative and probably confused.)

Wei Dai came up with a problem that seems equivalent to a variant of Counterfactual Mugging with some added twists:

  • the coinflip is "logical", e.g. the parity of the millionth digit of pi;
  • after you receive the offer, you will have enough resources to calculate the coinflip's outcome yourself;
  • but you need to figure out the correct decision algorithm ahead of time, when you don't have these resources and are still uncertain about the coinflip's outcome.

If you give 50/50 chances now to the millionth digit of pi being even or odd, you probably want to write the decision algorithm so it agrees to pay up later even when faced with a proof that the millionth digit of pi is even. But from the decision algorithm's point of view, the situation looks more like being asked to pay up because 2+2=4. How do we resolve this tension?

One of the main selling points of TDT-style decision theories is eliminating the need for precommitment. You're supposed to always do what you would have precommitted to doing, even if it doesn't seem like a very good idea after you've done your Bayesian updates. UDT solves Counterfactual Mugging and similar problems by being "updateless", so you keep caring about possible worlds in accordance with their apriori probabilities regardless of which world you end up in.

If we take the above problem at face value, it seems to tell us that UDT should treat logical uncertainty updatelessly too, and keep caring about logically impossible worlds in accordance with their apriori logical probabilities. It seems to hint that UDT should be coded from the start with a "logical prior" over mathematical statements, which encodes the creator's arbitrary "logical degrees of caring", just like its regular prior encodes the creator's arbitrary degrees of caring over physics. Then the AI must keep following that prior forever after. But that's a very tall order. Should you really keep caring about logically impossible worlds where 2+2=5, and accept bargains that help copies of you in such worlds, even after you calculate that 2+2=4?

That conclusion is pretty startling, but consider what happens if you reject it:

  1. Precommitment can be modeled as a decision problem where an AI is asked to write a successor AI.
  2. Imagine the AI is asked to write a program P that will be faced with Counterfactual Mugging with a logical coin. The AI doesn't have enough resources to calculate the coin's outcome, but P will have as much computing power as needed. The resulting utility goes to the AI.
  3. Writing P is equivalent to supplying one bit: should P pay up if asked?
  4. Supplying that bit is equivalent to accepting or refusing the bet "win $10000 if the millionth digit of pi is odd, lose $100 if it's even".

So if your AI treats logical uncertainty similarly enough to probabilities that it can make bets on digits of pi, reflective consistency seems to force it to have an unchanging "logical prior", and keep paying up in Counterfactual Mugging even when the logical coinflip looks as obvious to the AI as 2+2=4. Is there any way to escape this conclusion? (Nesov has an idea, but I can't parse it yet.) And what could a formalization of "logical priors" possibly look like?

New Comment
49 comments, sorted by Click to highlight new comments since: Today at 7:21 PM

I think CM with a logical coin is not well-defined. Say Omega determines whether or not the millionth digit of pi is even. If it's even, you verify this and then Omega asks you to pay $1000; if it's odd Omega gives you $1000000 iff. you would have paid Omega had the millionth digit of pi been even. But the counterfactual "would you have paid Omega had the millionth digit of pi been even and you verified this" is undefined if the digit is in fact odd, since you would have realized that it is odd during verification. If you don't actually verify it, then the problem is well-defined because Omega can just lie to you. I guess you could ask the counterfactual "what if your digit verification procedure malfunctioned and said the digit was even", but now we're getting into doubting your own mental faculties.

Perhaps I am missing the obvious, but why is this a hard problem? So our protagonist AI has some algorithm to determine if the millionth digit of pi is odd- he cannot run it yet, but he has it. Lets call that function f{}, that returns a 1 if the digit is odd, or a 0 if it is even. He also has some other function like: sub pay_or_no { if (f{}) { pay(1000); }

In this fashion, Omega can verify the algorithm that returns the millionth digit of pi, independently verify the algorithm that pays based on that return, and our protagonist gets his money.


This seems to be the correct answer to jacobt's question. The key is looking at the length of proofs. The general rule should go like this: when you're trying to decide which of two impossible counterfactuals "a()=x implies b()=y" and "a()=x implies b()=z" is more true even though "a()=x" is false, go with the one that has the shorter proof. We already use that rule when implementing agents that compute counterfactuals about their own actions. Now we can just implement Omega using the same rule. If the millionth digit of pi is in fact odd, but the statement "millionth digit of pi is even => agent pays up" has a much shorter proof than "millionth digit of pi is even => agent doesn't pay up", Omega should think that the agent would pay up.

The idea seems so obvious in retrospect, I don't understand how I missed it. Thanks!

If the millionth digit of pi is in fact odd, but the statement "millionth digit of pi is even => agent pays up" has a much shorter proof than "millionth digit of pi is even => agent doesn't pay up", Omega should think that the agent would pay up.

This seems equivalent to:

has a much shorter proof than "millionth digit of pi is odd"

But does that make sense? What if it were possible to have really short proofs of whether the n-th digit of pi is even or odd and it's impossible for the agent to arrange to have a shorter proof of "millionth digit of pi is even => agent pays up"? Why should the agent be penalized for that?

Maybe the whole point of a logical coinflip is about being harder to prove than simple statements about the agent. If the coinflip were simple compared the the agent, like "1!=1", then a CDT agent would not have precommitted to cooperate, because the agent would have figured out in advance that 1=1. So it's not clear that a UDT agent should cooperate either.

I agree, this seems like a reasonable way of defining dependencies between constant symbols. In case of logical uncertainty, I think you'd want to look into how relative lengths of proofs depend on adding more theorems as axioms (so that they don't cost any proof length to use). This way, different agents or an agent in different situations would have different ideas about which dependencies are natural.

This goes all the way back to trying to define dependencies by analogy with AIXI/K-complexity, I think we were talking about this on the list in spring 2011.

Good point, thanks. You're right that even-world looks just as impossible from odd-world's POV as odd-world looks from even-world, so Omega also needs to compute impossible counterfactuals when deciding whether to give you the million. The challenge of solving the problem now looks very similar to the challenge of formulating the problem in the first place :-)

I pointed out the same issue before, but it doesn't seem to affect my bargaining problem.

Why not? It seems to me that to determine that the staple maximizer's offer is fair, you need to look at the staple maximizer's assessment of you in the impossible world where it gets control. That's very similar to looking at Omega's assessment of you in the impossible world where it's deciding whether to give you the million. Or maybe I'm wrong, all this recursion is making me confused...

What I meant is, in my version of the problem, you don't have to solve the problem (say what Omega does exactly) in order to formulate the problem, since "the staple maximizer's assessment of you in the impossible world where it gets control" is part of the solution, not part of the problem specification.

It seems to hint that UDT should be coded from the start with a "logical prior" over mathematical statements, which encodes the creator's arbitrary "logical degrees of caring", just like its regular prior encodes the creator's arbitrary degrees of caring over physics.

This seems like a type error. Statements are references to collections of structures, they are not themselves the kind of stuff agent should care about, they are elements of agent's map, not territory. Caring about physics generalizes to caring about abstract structures, not to caring about statements that describe abstract structures. So there might be some notion of a prior over abstract structures, possibly expressed in terms of logical statements about them, but it wouldn't be a prior over the statements.

In principle, I see no reason to treat logical probability differently from other probability - if this can be done consistently.

Say there were some empirical fact about the world, concealed inside a safe. And say you can crack the safe's combination as soon as you can solve a certain logical fact. Then "ignorance about the contents of the safe", a very standard type of ignorance, feels exactly like "ignorance about the logical fact" in this case.

I think you can generally transform one type of uncertainty into another in a way that leaves the intuitions virtually identical.

This is not really what the problem discussed in this post about. Given a setting where there are many possible worlds for all kinds of alternative observations, we have three basic kinds of uncertainty: logical uncertainty, uncertainty about the joint state of all possible worlds ("state uncertainty"), and uncertainty about location within the collection of these possible worlds (indexical uncertainty). If there are enough possible worlds in our setting, then most observations of the kind "Is this box empty?" cash out as indexical uncertainty: in some possible worlds, it's empty, and in others it's not, so the only question is about which worlds it's empty in, a question of finding the locations within the overall collection that fit the query.

Of these, logical uncertainty is closer to state uncertainty than to indexical uncertainty: if you figure out some abstract fact, that may also tell you what all possible (non-broken) calculators will say, but some of the boxes will be full, and some will be empty. Of course, there is no clear dividing line, it's the structure of the collection of your possible worlds and prior over it that tells you what observations are more like calculators (related to abstract facts), and which are more like boxes (unrelated to abstract facts, mostly only telling you which possible worlds you observe).

(The UDT's secret weapon is that it reduces all observations to indexical uncertainty, it completely ignores their epistemic significance (interpretation as abstract facts), and instead relies on its own "protected" inference capacity, to resolve decision problems that are set up across its collection of possible worlds in arbitrarily bizarre fashion. But when it starts relying on observations, it has to be more clever than that.)

Now, you are talking about how logical uncertainty is similar to state uncertainty, which I mostly agree with, while the problem under discussion is that logical uncertainty seems to be unlike indexical uncertainty, in particular for the purposes of applying UDT-like reasoning.

I was under the impression that there were clear examples where logical uncertainty was different than regular uncertainty. I can't think of any, though, so perhaps I'm missremembering. I would be very interested in the solution tho this.

Logical uncertainty is still a more subtle beastie methinks - but for the examples given here, I think it should be treated like normal uncertainty.

"Normal" priors are about comparative value of worlds, with observations only resolving indexical uncertainty about your location among these worlds. In UDT, there is typically an assumption that an agent has excessive computational resources, and so the only purpose of observations is in resolving this indexical uncertainty. A UDT agent is working with a fixed collection of possible worlds, and it doesn't learn anything about these worlds from observation. It devises a general strategy that is evaluated by looking how it fares at all locations that use it, across the fixed collection of possible worlds.

In contrast, logical uncertainty is not about location within the collection of possible worlds, it's about the state of those worlds, or even about presence of specific worlds in the collection. The value of any given strategy that responds to observations would then depend on the state of logical uncertainty, and so evaluating a strategy is not as simple as taking the current epistemic state's point of view.

A new possibility opens: some observations can communicate not just indexical information, but also logical information (alternatively, information about the state of the collection of possible worlds, not just location in the worlds of the collection). This possibility calls for something analogous to anthropic reasoning: the fact that an agent observes something tells it something about the big world, not just about which small world it's located in. Another analogy is value uncertainty: resolving logical uncertainty essentially resolves uncertainty about agent's utility definition (and this is another way of generating thought experiments about this issue).

So when an agent is on a branch of a strategy that indicates something new about the collection of possible worlds, the agent would evaluate the whole strategy differently from when it started out. But when it started out, it could also predict how the expected value of the strategy would look given that hypothetical observation, and also given the alternative hypothetical observations. How does it balance these possible points of view? I don't know, but this is a new problem that breaks UDT's assumptions, and at least to this puzzle the answer seems to be "don't pay up".

Our set of possible worlds comes from somewhere, some sort of criteria. Whatever generates that list passes it to our choice algorithm, which begins branching. Lets say we receive an observation that contains both Logical and Indexical updates- could we not just take our current set of possible worlds, with our current set of data on them, update the list against our logical update, and pass that list on to a new copy of the function? The collection remains fixed as far as each copy of the function is concerned, but retains the ability to update on new information. When finished, the path returned will be the most likely given all new observations.

I don't think there's any particular reason that being updateless should be useful in general - it's just that "find the winning strategy, and then do that" is equivalent to being updateless, and works really well on most problems.

In fact, I'd expect a computation-aware decision theory to not say "find the winning strategy, and then do that," because it couldn't always expect to do that except in the limit of infinite computing time (even if the problem is guaranteed solvable) - and that limit would also eliminate logical uncertainty.

(Sorry - I've just stumbled on this thread and I'm not sure whether I should post this comment here or on Wei Dai's original post).

It seems to me that the right thing for clippy to do is choose 10^20 staples instead of 10^10 paperclips, but not because of anything to do with logical uncertainty.

There are presumably a vast number of parallel (and/or non-logically-counterfactual) universes where games of this nature are being played out. In almost exactly half of them, the roles of clippy and staply will be reversed. A UDT clippy will happily cooperate in this case knowing that the staplies in parallel universes will do the same and generate lots of paperclips in the process.

This would still be the case if there's no pi calculation involved, and Omega just flies to a planet, finds the first two utility maximizers with orthogonal utility functions and offers one of them a choice of 10^10 units of utility (confusion note: according to what scale???) or 10^20 units of the other agent's utility.

Added fun:

  1. A paperclip maximizer seems to have an inherent advantage against a paperclip minimizer (i.e. eventually Omega is unable to destroy any more paperclips). But what if you were a (paperclip minus staple) maximizer? Any win in one universe is going to be exactly canceled out by an analogous win for a (staple minus paperclip) maximizer in another universe, if we assume that those will be spawned with equal probability.

  2. My argument only works because the concept of paperclips seems non-entangled with the concept of the millionth digit of pi. What if you replace "paperclip" with "piece of paper saying the millionth digit of pi is odd"?

What do you think of this thread, in particular the part quoted in the last comment?

The quote was:

I think it's right to cooperate in this thought experiment only to the extent that we accept the impossibility of isolating this thought experiment from its other possible instances

I agree with one direction of implication - if we accept the impossibility of isolating this thought experiment form its other possible instances (e.g. cases with the exact same wording but with paperclips and staples swapped) then it's right to cooperate.

If we don't accept that then I will admit to being confused and have nothing meaningful to say either way. I accept the "least convenient world" principle for when someone suggests a really bizarre thought experiment, but I'm having trouble with the concept of "least convenient set of possible counterfactual worlds". Is this concept worth exploring in its own right?


I get the impression that your models of decision theory implicitly assume that agents have logical omniscience. Logical uncertainty contradicts that, so any theory with both will end up confused and shuffling round an inconsistency.

I think to solve this you're going to have to explicitly model an agent with bounded computing power.

This problem is even trickier, you have to explicitly model an agent with unlimited computing power that momentarily adopts the preferences of a bounded agent.

Question: is there a proof that strategies that follow "always doing what you would have precommitted to doing" always dominate strategies that do not, in some sense? Or perhaps "adding a precommittment stage always improves utility"?

No - if the universe punishes you for behaving some way (e.g. adding a precomittment stage), then doing that is dominated. There are no dominant strategies against all possible states of the universe.

Fair enough, but perhaps there proofs that it dominates unless you are punished specifically for doing it?

How sure are people that "do what you would have precommitted to doing" is a good strategy? Wanting to build it into the decision theory seems to suggest very high certainty.

Well, it seems obvious that it's true - but tricky to formalise. Subtle problems like agent simulates predictor (when you know more than Omega) and maybe some diagonal agents (who apply diagonal reasoning to you) seem to be relatively believable situations. It's a bit like Godel's theorem - initially, the only examples were weird and specifically constructed, but then people found more natural examples.

But "do what you would have precommitted to doing" seems to be much better than other strategies, even if it's not provably ideal.

I posted this article to the decision theory group a moment ago. It seems highly relevant to thinking concretely about logical uncertainty in the context of decision theory, and provides what looks to be a reasonable metric for evaluating the value of computationally useful information.

ETA: plus there is an interesting tie-in to cognitive heuristics/biases.

Link nitpick: When linking to arXiv, please link to the abstract, not directly to the PDF.

I wonder if the downvote I see here would have been an upvote if you used 'request' instead of 'nitpick' (not that I mind either way).

which encodes the creator's arbitrary "logical degrees of caring", just like its regular prior encodes the creator's arbitrary degrees of caring over physics

(Has there been any work on moving from an arbitrary point of timestamping to something that obeys something like dynamic consistency requirements? One could think up decision problems where two UDTs would have coordinated except their moment of caring-encoding was arbitrarily single-pointed in spacetime, then try to use such examples to motivate generalized principles or notions of consistency. That's a different way of advancing UDT that seems somewhat orthogonal to the focus on self-reference and logical uncertainty. (The intuition being, of course, that arbitrariness is inelegant, and if you see it that's a sign you need to go meta.) Maybe Nesov's desire to focus more on processes and pieces rather than agents will naturally tend in this direction.)

Um, aren't timestamped preferences already dynamically consistent?

But from the decision algorithm's point of view, the situation looks more like being asked to pay up because 2+2=4. How do we resolve this tension?

If probability is all in the mind, how is this different? What is the difference between eventually calculating an unknown digit or simply waiting for the world to determine the outcome of a coin toss? I don't see any difference at all.

Logical uncertainty is weird because it doesn't exactly obey the rules of probability. You can't have a consistent probability assignment that says axioms are 100% true but the millionth digit of pi has a 50% chance of being odd. So I won't be very surprised if the correct way to treat logical uncertainty turns out to be not completely Bayesian.

You can't have a consistent probability assignment that says axioms are 100% true but the millionth digit of pi has a 50% chance of being odd.

Why not? If you haven't actually worked out the millionth digit of pi, then this probability assignment is consistent given your current state of knowledge. It's inconsistent given logical omniscience, but then if you were logically omniscient you wouldn't assign a 50% chance in the first place. The act of observing a logical fact doesn't seem different from the act of making any other observation to me.

If you knew that the universe doesn't obey Newtonian mechanics precisely, it would be inconsistent to assign them a high probability, but that doesn't mean that an early physicist who doesn't have that knowledge is violating the rules of probability by thinking that the universe does follow Newtonian mechanics. It's only after you make that observation that such an assignment becomes inconsistent.

Actually this strikes me as a special case of dealing with the fact that your own decision process is imperfect.

I think that in the Counterfactual Mugging with a logical coin, unlike CM with a quantum coin, it is incorrect to abstract away from Omega's internal algorithm. In the CM with a quantum coin, the coin toss screens off Omega's causal influence. With the logical coin it is not so.

Good point, but what if you care about only a single world program where Omega is hardcoded to use the millionth digit of pi as the coinflip, and you have logical uncertainty about that digit?

Then I'll have to know (or have probability distribution over) where the world program comes from. If it was created by Omega-2, then I'm back at square 1. If the world program is laws of physics, then I suppose, logical uncertainty is equivalent to logical probability, like in a regular (non-quantum) coin toss. But then, the CM problem is a very poor model of typical laws of physics. And, with laws of physics, you'll never need to accept bargains conditioned on 2+2=5...

But you might need to accept bargains conditioned on more complicated logical facts, and the bargains may involve future versions of you who will find these logical facts as trivial as 2+2=4.

If we want decision theory to answer the question "what kind of AI should I write?", then using "logical priors" is very likely to be the right answer. But Wei has a different way of looking at the problem which seems to make it much harder: he asks "what decision theory should I be following?"

I think I need a different problem as an intuition pump for this. Can't reformulate the CM problem satisfactorily. It all comes back to Omega's original motivation. Either it was "fair", and so equivalent to a quantum coin toss at some point in the causal chain, or not. If it was fair, then it's equivalent to regular CM, so I should accept the bargain and not "update" later, even for 2+2=5. If not, then it all depends on what it is...

Quantum juju has nothing to do with decision theory. I guess I should have included it in the post about common mistakes. What would you say about this problem if you lived in a deterministic universe and never heard about quantum physics? You know that boundedly rational agents in such universes can observe things that look awfully similar to random noise, right?

Well, tossing a quantum coin is a simple way to provably sever causal links. In a deterministic universe with boundedly rational agents, I suppose, there could be cryptographical schemes producing equivalent guarantees.

What if I reformulate as follows: Omega says that it tossed a coin and so chose to check either the oddness or the evenness of the millionth digit of pi. Coin indicated the "oddness", so bla bla bla.

The properties of the problem appear the same as the logical-coin CM, except now the possible causal influence from Omega is severed.

If I'm Omega, and I decide to check whether the 10^10 th digit of pi is 0, 2 or 5, and reward you if it is... how would you feel about that? I chose those numbers because we have ten fingers, and I chose reward because "e" is the 5th letter in the alphabet (I went through the letters of "reward" and "punish" until I found one that was the 10th (J), 5th (E) or 2nd (B) letter).

Or a second variant: I implement the logical-coin CM that can be described in python in the most compact way.

If it's true that you chose the numbers because we have ten fingers (and because of nothing else), and I can verify that, then I feel I should behave as if the event is random with probability 0.3, even if it was the 10-th digit of pi, not 10^10-th.

Yep - welcome to logical uncertainty!

I never had anything against logical uncertainty :)

The point, though, is that this setup - where I can verify Omega's honest attempt at randomness - does not produce the paradoxes. In particular, it does not allow someone to pump money out of me. And so it seems to me that I can and should "keep paying up in Counterfactual Mugging even when the logical coinflip looks as obvious as 2+2=4."