Counterfactual Mugging and Logical Uncertainty

[-]Wei Dai16y70

Would you (or your ideal of rationality) still give $100 if I replace "10000th decimal digit of pi" with "the 10000th positive integer", or with "the smallest non-negative integer", or with just "0"?

If not, what's special about "10000th decimal digit of pi"? (Apparently you're assuming that you can compute it in your head, so that's not the difference.)

If yes, how do you (or Omega) compute a counterfactual where 0 is odd, or 1 is even?

[-]Wei Dai16y50

Upon further thought: there is no objective answer to "what you would do if 1 was even" or "what you would do if the 10001th digit of pi was even" (given your source code). The answer that Omega computes has to be more or less arbitrary, and depends on details of Omega's source code. If you knew that Omega was going to logical-counterfactually mug you, and you knew Omega's source code, and the reward is high enough, then you'd do whatever modifications are necessary on your own source code so that Omega would compute the "right" answer and reward you.

Therefore, if we include such problems in the problem class for which a decision algorithm should be reflectively consistent, then no decision algorithm is reflectively consistent.

ETA: Notice that in the version of CM with a physical coin, or with the n-th digit of pi where Omega is not computing what you would do if it was even or odd, but what you would do if you were told that it is even or odd, there is an objective answer to "what you would do if you were to receive the input 'coin landed tails'" and "what you would do if you were to receive the input '10000-th digit of pi is odd'", which simply involves running your source code on the given input.

[-]Technologos16y20

My understanding of the point of the post was that while a coin may physically land differently and thus instantiate the counterfactual, it is merely my current lack of knowledge (the "logical uncertainty" in the post title) that allows me to simulate a kind of pseudo-counterfactual in this case.

Since I do not know the millionth digit of pi, I can still speak meaningfully of the cases where it is and isn't odd.

[-][anonymous]16y00

The 10001th digit of pi is 5.

[-]Vladimir_Nesov16y30

The simplest case is when a fact that is being considered counterfactually is received from a given observation, so that you can explicitly say where the parameter is in the system, and use the dynamic specification of the system to see what happens to it depending on the parameter. That's the case with the coin and random digit index.

10000th digit of pi is one step more complicated, but it's still independent on most of your knowledge, so it's conceptually easier to localize knowledge about it in your mind. Once you start considering the question, knowledge about its answer starts affecting your dynamic, and this influence can likewise be tracked to the source. That's why I introduced Pi(n) as a local expression: all the knowledge in the algorithm about the answer to this question comes from this single procedure, so by varying its contents you can examine the impact of its different values of the future behavior.

Whether or not 1 is even is much more pervasive, so the surgery that changes it will be hard and not at all intuitively obvious. So, the disagreement seems to be that you trust your intuition about whether it's possible to make 1 an even number in your mind, while I trust the generalization of idea that you can change whether the coin lands on one side or another, whether Pi(10000) is even or odd, and arbitrarily more pervasive questions as well.

This does depend a lot on what Omega understands by the question (how Omega's algorithm logically depends on the question, and on your algorithm), which is related by my unwillingness to conclude that mutual cooperation is the clear-cut outcome of PD. In this thought experiment, this understanding is mostly specified, in other cases intuitive grasp of the problem won't be enough.

[-]Wei Dai16y30

If a theory of logical counterfactuals is to apply to statements of the form "If X was true, then Y would be true", do we need to restrict the forms of X and Y, or can they be arbitrary mathematical propositions?

For example, does it make sense to ask something like, "What is 13*3, if 3*3 was 8?" An obvious answer is "38", but what if you're doing multiplication in binary?

[-]SilasBarta16y40

I don't see why a theory of counterfactuals couldn't apply to mathematical propositions. After all, our cognitive architectures use causality at a primitive level, and the same architecture is taught math.

And certainly, while learning math, you were taught results that didn't "seem" right at the time, so you worked backwards until you could understand why that result (like 2+6 = 8) makes sense.

So you just have to imagine yourself in such a similar situation about math, learning it for the first time. If everyone in class seemed to understand multiplication but you, and it were also a fact that 3*3 = 8, what process would you figure was actually going on when you multiply? Then, apply that to 13*3.

[-]Vladimir_Nesov16y10

"What is 13*3, if 3*3 was 8?"

To this I ask: "Which 3*3?". The whole procedure is something that is done with a description of program (system), and any facts of which we can speak as holding for the system are properties of the system's "mind". Thus, the fact of what 3*3 is must be located somewhere specifically (more generally, as a property), for it to be meaningful to talk about this fact in relation to the system. You are considering interaction between this fact, as parameter, and the rest of the system, and this activity requires seeing both on equal rights.

When you, as a human, reading the question, you may try to interpret it as pointing to a specific subsystem, as I did in the post. More generally, the question is only meaningful in this way if it admits such interpretation.

[-]Wei Dai16y00

I think I sort of see what you mean. Perhaps this is an avenue worth exploring, given that we don't seem to have many other suggestions on how to solve logical uncertainty. I'll have to think on this more.

[-][anonymous]16y00

The 10000th decimal digit of pi is 8, by the way (not counting the leading 3).

[-]Steve_Rayhawk16y40

What does Omega do if your algorithm contains "if ⊥ is provable, then give $100"?

[-]Vladimir_Nesov16y00

Could you be more explicit about the intention of the question?

Omega's surgery doesn't introduce contradictions in your code: it doesn't (say) make Pi(10000) evaluate to 7, it just replaces Pi(10000) in the code with 7, which gives perfectly good code, just different from the original.

[-]Wei Dai16y30

This form of Counterfactual Mugging may be instructive, as it slaughters the following false intuition, or equivalently conceptualization of "could": "the coin could land either way, but a logical truth couldn't be either way".

Perhaps this version of Counterfactual Mugging is not really about logical uncertainty, but rather uncertainty about one's source code. In UDT1, I assumed that an agent would know its own source code with certainty, but here, if we suppose that Omega does its counterfactual prediction using source-code surgery plus simulation, then our agent can't distinguish whether it's the agent with the original source code, or the one in the simulation with the modified source code.

Although I haven't worked out the details, it seems possible to modify UDT1 to handle uncertainty about one's source code, with the result that such an agent would give Omega $100 in this situation. Basically, when Is_Odd(Pi(n)) returns "true", you would think:

Did it return "true" because Pi(n) is odd, or because Pi(n) is even and my source code has been modified for it to return true anyway? I don't know, so I don't know whether Pi(n) is even or odd, and I better act as if I don't know.

This doesn't seem to require slaughtering the intuition that "a logical truth couldn't be either way" because I can think that a logical truth couldn't be either way but I just don't know which way it is, and that still allows me to make the right decision. Do you agree, or do you still think that intuition needs to go?

[-]Vladimir_Nesov16y00

I'd say things differently now. I'd drop the distinction between "logical uncertainty" and uncertainty about the output of one's source code, as knowledge about a formal system basically is a program that you can run, which basically is part of your source code (maybe with observed data, but then data became part of you -- what distinguishes you observing an event from the event observing you? -- it's more like merging). The important intuition in this case is that there is no transparency, that having a source code of a program is not at all the same thing as knowing how it behaves (it's not even about the halting problem, as simple calculations are still some computational steps away -- although static analysis (abstraction) may allow to run infinitely faster). You are not uncertain about your source code, you are uncertain about what it'll do. Logical hypotheticals can be seen as playing the central role in decision-making, as the steps in proof search that suggests the steps that one's own (known) algorithms could do, and seeing whether they should be made real ("winning", in games semantics terminology, which is highly misleading from goal-directed strategy point of view, as they only won your choice, not the "game"). While you can't reach some logical truths in a limited time, you can consider their hypothetical states, thus the program isn't so much being modified, as it is being refined where its consequences can't be directly observed (with naive formalism the difference between the program and its effect blurs). I still have serious gaps in my understanding of this stuff, so am not ready to describe it yet.

This doesn't seem to require slaughtering the intuition that "a logical truth couldn't be either way" because I can think that a logical truth couldn't be either way but I just don't know which way it is, and that still allows me to make the right decision. Do you agree, or do you still think that intuition needs to go?

If things that "could" be done or "could" happen are ones considered in hypotheticals during decision-making, then logical truths (possible behaviors of a program) should be comfortable as things that could be either way.

[-]cousin_it16y10

Omega could have known the 10000th digit of pi beforehand, so now it has a strategy for reliably extracting money from you. Or do you place some other restriction on Omega's behavior that I'm not seeing?

[-]Vladimir_Nesov16y00

Like in the original problem, Omega's decision to run the game is independent on the game's parameters. Omega is not trying to win, it just implements the thought experiment.

[-]Johnicholas16y00

I can't tell what your recommended action is - do you give the $100 or not?

[-]Vladimir_Nesov16y00

I think this case is essentially the same as the original one, and this similarity is the topic of the post.

It looks like in the original case (and so this one) you should give the $100 if you are an AI running human preference, and most likely if you are a human too, unless human preference gets "updated" (currupted) by the reflectively inconsistent human brain, so that once you learn about the new fact, the new preference says that you shouldn't give the $100, because the probability of the alternative dropped through the floor (in your representation).

[-]byrnema16y00

Where is the best place to read an explanation of why giving the $100 is what you "should" do? (Or could someone please summarize the rationale?)

[-]Vladimir_Nesov16y30

You can read the first thread, the post for a short description of the theoretical reasons for giving up $100 (expected utility, reflective consistency), and more in the comments.

As I noted, I'm not sure it's what you really should do, as a human, but it looks like it. I changed my mind about this conclusion a couple of times since the problem statement, first believing that you should give up $100, because it was what the UDT suggested, then that you shouldn't, remembering that human brain probably does erase the counterfactual preference; now I'm back to being unsure about what goes on in the human brain, but trusting the normative theory as a better standard for decisions in the meantime.

[-]byrnema16y20

Reading through the comments of that post, I understood this to be the gist of the argument for why you would give up the $100:

Before knowing the outcome of the coin flip, you would have taken the wager to pay $100 for a 50% chance to win $10000. Alternatively, if Omega had asked you to "precommit" $100 in case you lost, you would still agree -- its nearly exactly the same thing. (Technically it's an even better wager.) What if Omega asks you to precommit a witless future self? You would like to pre-commit your future self.

So you, your current self, while trying to decide whether to pay Omega or not, have decided that you would actually like to precommit a future self to paying the $100. How do you do that? By being that future person in the present and committing your current self to pay the $100. Indeed you lost, but being consistent with "being a payer" is what you decided you wanted.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

16

Counterfactual Mugging and Logical Uncertainty

16

16