Further considerations on the Evidentialist's Wager

Martín Soto

Thanks to Caspar Oesterheld for the discussion on which this post is largely based.

In a previous post I presented an objection to The Evidentialist's Wager. Reading it is probably necessary to understand the following.

A counter-argument to my objection

In the post I broadly claim the following:

Imagine we have absolutely no idea whether more acausally correlated agents in the universe are positively correlated to us/Good Twins (our one-boxing is evidence for them increasing our utility function) or negatively correlated to us/Evil Twins (our one-boxing is evidence for them decreasing our utility function). That is, our credence on the two events is 50%^[1]. Then, when facing a Newcomb decision, the stakes for EDT and CDT are equal. This is because any evidence of additional utility gains provided by EDT will balance out in the expected value calculation, given our complete uncertainty about the interpretation of that evidence (whether one-boxing is evidence for a universal increase or decrease in our utility).

Shortly after writing the post, I discovered the following counter-argument: In that situation, EDT urges you to research further whether more positively or negatively correlated agents exist, to break the symmetry and then act accordingly. That is, it is plausible (it has non-zero probability) that dedicating more resources to studying this issue ends up breaking the symmetry, and changing your credences away from 50-50 in some of the two possible directions. If that happens, then the Wager applies and the stakes for EDT will be higher (which will either urge you to one-box or to two-box, depending on how the symmetry was broken). And so, even in the described situation, the stakes of EDT are higher, and if it's not immediately obvious it's just because EDT doesn't recommend neither of one-boxing or two-boxing, but the third option of researching further into the nature of the multiverse.

But the above argument presents a flaw, related to other issues with the concept of negative correlation, which is generally problematic and ill-defined (as I pointed out in the first post, to the extent that the definition of negatively correlated agents is not clear, but that's not an issue for what follows).

Considering meta-reasoning

Suppose the correlation between us and the Good Twins is perfectly positive (identical copies), and with the Evil Twins is perfectly negative (identical copies with flipped utility function). Then they will also be in the 50-50 situation and appreciate the EDT urge to research further, and so will research further. Of course, both groups will obtain different results from their research (if it's carried out correctly). That is, imagine the real distribution between Good Twins and Evil Twins is 60-40. Then the Good Twins will receive evidence that they are a majority, and the Evil Twins that they are a minority. And here, the acausal correlation is broken: the Evil Twins will no longer employ decision theory in the same way as us, because they have obtained some different evidence.

Naively (as in the above counter-argument), the Evil Twins might conclude: "Aha, so I should two-box (even if that's bad for me and all other Evil Twins), because then all Good Twins (of which there are more) will also two-box, so that's better for my utility function (provided EDT is correct, of course)". But they'll shortly notice that their correlation to the Good Twins has been broken, and so their actions no longer provide evidence about the Good Twins' actions. So they should consider only their correlated agents (all Evil Twins), and act accordingly (one-box). And of course, the Good Twins will also one-box (they would have even if the correlation to Evil Twins had somehow magically been preserved, because they're the majority).

That is, in breaking the symmetry we have also broken the correlation to all Evil Twins, since both us and Evil Twins were studying the same metric, but with opposite consequences for our actions. And so, we can't just "do the research and then one-box or two-box accordingly", because doing the research itself is an action that provides evidence (more on this below).

Here's what just happened to our argument. Originally we were only considering an idealized scenario, with a binary decision to take: you face a Newcomb problem, and your only two possible actions are to one-box or two-box. And sure, in this scenario, given 50-50 prior on the Twins, EDT and CDT will hold the same stakes. But when you take into account further actions which are obviously always available in any real-world situation^[2] (some of which are purely computational, such as just sitting there and thinking about the problem for some minutes with your physical brain), then it is plausible that this perfect symmetry breaks (even if by the slightest margin), and so the EDT high stakes return. But then, you notice that, upon going up to this meta-reasoning/partaking in this research, your correlated agents have also done so, and as a consequence some of them are no longer correlated. And going up to this meta-consideration once again, they will also have noticed this, and will act accordingly (so you already have the evidence that they will one-box, whatever you do). Notice this means (apparently paradoxically) that all agents, after carrying out this reasoning, will one-box. But they do so (or at least the Evil Twins) strictly because they notice they are no longer correlated to the others. That is, after carrying out this reasoning, the Evil Twins already have the full evidence that Good Twins will one-box. And so, two-boxing would only screw them over, and they one-box as well.

In other words, the Evil Twins would have liked to stop the research/reasoning from the start (or before arriving to conclusions that inform actions taken), because this would provide evidence that the Good Twins also have (or more strictly speaking, it wouldn't have provided Evil Twins with full evidence that the Good Twins will one-box). But of course, by the time they decided to do research (without yet knowing they would be the minority), this already provided evidence that the Good Twins were doing the same. And by the time they received negative results, this provided evidence that the Good Twins received positive results^[3].

But then, should you research?

By following the above argument, every agent with a credence of 50-50 will know that, if it partakes in research, then it can either conclude it's in the majority (and that the majority will know this and one-box) and so obtain evidence that its utility will be higher than expected, or conclude it's in the minority (and that the majority will know this and one-box) and so obtain evidence that its utility will be lower than expected. So it is both hopeful and fearful of partaking in research. I feel like these two considerations (or more concretely, expected utility calculations) will again perfectly balance out (but now on the meta-level, or possibly jumping to ever higher meta-levels), and so indeed an agent with credences of 50-50 will really just go with their best-guess decision theory (without EDT presenting higher stakes). Or at least, from their perspective this will have as much (inter-theoretic) expected utility as doing the research^[4].

One-boxing as a fixed point

But wait! We can include another practical consideration. An agent having exactly 50-50 credence might be as unlikely as the universe containing exactly 50-50 Twins (at least, for agents good enough at keeping track of precise probabilities). And indeed, even an agent with a 50-50 credence will very surely at some time receive some (maybe colateral and apparently uncorrelated) evidence that updates its credence on this issue and breaks the symmetry. By the above argument, by the time the symmetry has been broken (even if in this unintended manner) the correlation to Evil Twins will have been broken (they will have received the opposite update, and act accordingly). So from there on, the agent has evidence that all of its correlated agents are Good Twins, and will always apply the Wager and one-box (regardless of whether it is in the original majority or minority).^[5]

That is, this seems to indicate that (in the perfect correlation case) even the slightest evidence in any one of the two directions will prove forever (for the agent) that all correlated agents are Good Twins.

Some practical questions

On a related note (if we drop the assumption that all correlations are perfect), might it be that, the more research an agent carries out, the more certain it can be of being correlated only to ever-more-Good Twins? Imagine you get a small piece of evidence that you are the majority. If the research process is correct, most agents still correlated to you (that is, who have received the same evidence as you) are actually in the majority. But there might be some agents in the minority that, even following the same correct research process, because of some contingent bad luck, obtain evidence of being the majority, and so are still correlated to you. This would seem to be less probable the bigger the evidence received.

Might there be some situation in which an agent wants to ensure all of its correlates are Good Twins, and so should partake in more research before taking any other action? Maybe the fear of being the one with contingently bad luck (and so being correlated only to your Evil Twins) will always balance out the further security of being correct (and so correlated only to your Good Twins)^[6], so that the amount of research ends up not mattering (which would be counter-intuitive)?

The authors' actual counter-argument

Although I found the above ideas interesting, the easiest way out of my objection is just noticing that our credences should not be 50-50, but way more positive, which is Caspar's (and apparently the other authors') position.

Indeed, there are some solid basic arguments favoring Good Twins:

Anthropically, our existence provides evidence for them being favored.
It seems plausible that evolutionary pressures select for utility functions broadly as ours, although by fragility of value we might need very precise correlation (but this might still happen, even if less).
On a related note, even if very different evolutionary processes yield very different utility functions, it might be that there's a physical correlation (because of "brain" architecture or how physical contexts arise) between the decision theory (or other mechanisms) of an agent and its values.

In my first post, I mostly assessed 50-50 plausible because of the potential craziness of digital minds (many of which could be negatively correlated to us). But I'd still need an argument to defend the future existence of more negatively than positively correlated digital minds, and I don't have one. In fact, there are some obvious reasons to expect most digital minds in some futures to be positively correlated (we solve Alignment). While on the contrary, I don't see a clear reason to expect many negatively correlated minds in almost any future. This could be the case in scenarios with extortion/malevolent actors, but I find them even less likely, since they probably require the existence of another, approximately as intelligent actor positively correlated to us. This is not only conjunctive, but probably requires us solving Alignment but still facing an extorting/malevolent AGI, which seems improbable.

For further considerations on negative correlations and the probable values of superrational agents, see Caspar's Multiverse-wide Cooperation via Correlated Decision Making sections 2.6.2 and 3.4 (thanks to Sylvester Kollin for this recommendation!).

An unrelated after-thought: choosing the correct decision theory

In the original article, the authors convincingly argue for the reasonableness of hedging under decision-theoretical uncertainty. But some worries remain about the coherence of this whole approach, and especially the concept of there being a "correct" decision theory, and us being able to somehow amass evidence (or carry out computations) to improve our guess as to which is the correct one.

The authors address, given uncertainty about decision theories, how to carry out intertheoretical value comparisons. But they don't address how to compare the theories themselves, as theories of instrumental rationality (which should be value independent).

Indeed, suppose you have non-zero credence in both EDT and CDT. What would it mean (for you, subjectively) for one of them to be the "correct" decision theory? Arguably, for it to better maximize your goals (with respect to other theories). But of course, to compare such maximizations, you already need a decision theory (which tells you what "maximizing your goals" even is).

That is, you should just choose the decision theory such that the action "I follow $D$ " maximizes your utility function. But different theories will assess differently what constitutes an action maximizing your utility function. For instance, for CDT it will be that action causally affecting world states, while for EDT it will be that taking that action provides evidence about world states.

Might it be that choosing the correct decision theory can only come down to a matter of intuition or aesthetics, or even that it should be regarded as a preference, just as your utility function? It would seem intuitively like this kind of decision should be somehow justified in practical grounds.

Annex: EDT being counter-intuitive?

As evidenced above, EDT agents might sometimes prefer not to receive undesirable evidence. For instance, say piece of evidence A proves that there is suffering in the world (as opposed to none) and that Alice can take some effort to prevent some (but not all) of it. Then, even if that is the real state of the world, it would naively seem like Alice would rather not receive this piece of evidence (that is, her utility would get maximized that way, since her utility isn't calculated as some physical phenomenon anchored in the external world, but as the evidence she receives).

Of course, this is just a mistake of assessing EDT from the outside of the agent's perspective, when it is explicitly construed as a theory for subjective decision making.

That is, if Alice has no way to know that this evidence exists, then she is not incurring in any mistake (if she can't know it exists, she can't prevent suffering either). If on the contrary she has a way to know this (and has healthy enough epistemic practices as to notice this), then she knows that A might be true, so her expected utility is an average between A and $\neg$ A, and finding out whether A will in expected value maximize her utility (because if it turns out that A, she will be able to lower the amount of suffering).

Of course, the boundaries of "having a way to know" and "healthy enough epistemic practices" are fuzzy, and lead to considerations like whether failing to correctly assess a certain piece of evidence counts as ethically incorrect or even impermissible. And so an evidential framework (completely robust in theory) could seem to be prone to some undesirable consequences in imperfect practice, like wishful thinking (even if real-world utilitarians are used to avoiding such failure modes in their most obvious appearances).

^{^}
Or more correctly for expected value calculation purposes, if these amounts are $P$ and $N$ , and $C$ our credence function on events, then for every natural number $n$ (or real numbers to more generally take into account the different utilities contributed by different agents), $C (P = N + n) = C (N = P + n)$ .
^{^}
We might imagine a situation in which an intelligence genuinely only has available internal computations before deciding, and furthermore it is allowed to think exactly the time which it needs to think to arrive to the 50-50 same-stakes consideration, but not to the further meta-reasoning about the need to partake in research (that is exactly what happened to me with the last post!). In that logically bounded case, the agent will indeed posit equal stakes in both EDT and CDT, and so will just act according to its best-guess theory. But of course, all of these ideas are more clearly applied to way less bounded agents, for which the probability of this is negligible (unless another agent has for some reason adversarially calculated and implemented that exact time on purpose).
^{^}
We might imagine some situations in which the research results aren't negatively correlated in that way. Maybe for some physical or theoretical reason evidence of the existence of Evil Twins is way easier to find than evidence of the existence of Good Twins. But of course, the only way in which Evil Twins would be able to exploit this fact is if they knew, and this would provide evidence that Good Twins also know (and so can adjust their estimates accordingly).
We might also posit whether imperfect correlation can get around this issue. Concretely, we would need the other agents to be correlated enough as to carry out this whole top-level reasoning, but not correlated enough as to carry research in the same way as us. Not only does this seem implausible, but again we can only exploit this fact if we have some reason to think their research will drive them in a particular (broad) direction or to particular conclusions and actions. And if we know this, they do too. Although, might it be that for some reason Evil Twins are way easier to predict/model than Good Twins, and so one group can predict the other but not conversely? Again, this would break a certain symmetry, and so we'd need the other agents to be correlated enough as to carry out this whole top-level reasoning, but not correlated enough as to be equally predictable. Which seems even less likely. (Or maybe we are just degenerating the situation into one group predicting another, and so the one-way evidence has already come regardless of our actions, and no acausal trade occurs)
^{^}
But then, they will choose to just go with their best guess, because this wastes fewer of their resources, right? Well, not exactly, because if EDT is true this will also waste resources of their Evil Twins, which allegedly maximizes their utility (they have less resources to minimize it). The strength of this consideration of course depends on the agent's credence on EDT, and I feel like it should wash away as well, leaving the two options (going with the best-guess decision theory or doing research on the existence of Twins) literally equally valuable (or maybe doing research would be as valuable as following EDT and one-boxing). But this feels weird.
^{^}
Maybe I could also say "even an agent in the 50-50 state will contemplate this argument, and so put high probability on it changing opinion, and so one-box straight away, and so actually all agents apply the Wager and the 50-50 agent has no one to trade with, and knows that". But this argument is circular: I'm just restating "it is a priori very unlikely that 50-50 is right, so agents will have a strong prior against that", but it could still be that, even including these considerations, an agent is completely or almost certain of 50-50 being true.
^{^}
Maybe this only happens in the literally zero-sum game when both utility functions are literally opposite.

[-]Caspar Oesterheld1y32

>Anthropically, our existence provides evidence for them being favored.

There are some complications here. It depends a bit on how you make anthropic updates (if you do them at all). But it turns out that the version of updating that "works" with EDT basically doesn't make the update that you're in the majority. See my draft on decision making with anthropic updates.

>Annex: EDT being counter-intuitive?

I mean, in regular probability calculus, this is all unproblematic, right? Because of the Tower Rule a.k.a. Law of total expectation or similarly conservation of expected evidence. There are also issues of updatelessness, though, you touch on at various places in the post. E.g., see Almond's "lack of knowledge is [evidential] power" or scenarios like the Transparent Newcomb's problem wherein EDT wants to prevent itself from seeing the content of the boxes.

>It seems plausible that evolutionary pressures select for utility functions broadly as ours

Well, at least in some ways similar as ours, right? On questions like whether rooms are better painted red or green, I assume there isn't much reason to expect convergence. But on questions of whether happiness is better than suffering, I think one should expect evolved agents to mostly give the right answers.

>to compare such maximizations, you already need a decision theory (which tells you what "maximizing your goals" even is).

Incidentally I published a blog post about this only a few weeks ago (which will probably not contain any ideas that are new to you).

>Might there be some situation in which an agent wants to ensure all of its correlates are Good Twins

I don't think this is possible.

[-]Martín Soto1y10

Thank you for your comment! I hadn't had the time to read de se choice but am looking forward to it! Thank you also for the other recommendations.

conservation of expected evidence

Yep! That was also my intuition behind "all meta-updates (hopes and fears) balancing out" above.

If you mean it's not possible to ensure all your correlates are Good, I don't see how doing more research about the question doesn't get you ever closer to that (even if you never reach the ideal limit of literally all correlates being Good).

If you mean no one would want to do that, it might seem like you'd be happy to be uncorrelated from your Evil Twins. But this might again be a naïve view that breaks upon considering meta-reasoning.

[-]shminux1y2-5

In the The Evidentialist's Wager there is no non-zero-probability world where you get more than 10 doses of the cure, why bother discussing zero-probability (impossible) worlds?

[-]Martín Soto1y21

I'm not sure I understand your comment. It is true that in their framing of the Moral Newcomb problem you can at most get 10 cures (because the predictor is perfectly reliable). But what you care about (your utility to maximize) is not only how many cures you personally receive, but how many such cures people similar to you (in other parts of the universe) receive (because allegedly you care about maximizing happiness or people not dying, and obtaining your 10 cures is only instrumental for that). And of course that utility is not necessarily bounded by the 10 cures you personally receive, and can be way bigger if your action provides evidence that many such cures are being obtained across the universe. The authors explain this in page 4:

This means that the simple state-consequence matrix above does not in fact capture everything that is relevant to the decision problem: we have to refine the state space so that it also describes whether or not correlated agents face boxes with cures in both. By taking one box, you gain evidence not only that you will obtain more doses of the cure, but also that these other agents will achieve good outcomes too. Therefore, the existence of correlated agents has the effect of increasing the stakes for EDT.

[-]shminux1y20

Thanks, I guess I don't really understand what the authors are trying to do here.

[-]Caspar Oesterheld1y10

I guess it's too late for this comment (no worries if you don't feel like replying!), but are you basically saying that CDT doesn't make sense because it considers impossible/zero-probability worlds (such as the one where you get 11 doses)?

If so: I agree! The paper on the evidentialist's wager assumes that you should/want to hedge between CDT and EDT, given that the issue is contentious.

Does that make sense / relate at all to your question?

Not "CDT does not make sense", but any argument that fights a hypothetical such as "predictor knows what you will do" is silly. EDT does that sometimes. I don't understand FDT (not sure anyone does, since people keep arguing what it predicts), so maybe it fares better. Two-boxing in a perfect predictor setup is a classic example. You can change the problem, but it will not be the same problem. 11 doses outcome is not a possibility in the Moral Newcomb's. I've been shouting in the void for a decade that all you need to do is enumerate the worlds, assign probabilities, calculate expected utility. You throw away silliness like "dominant strategies", they are not applicable in twin PD, Newcomb's, Smoking Lesion, Pafit's Hitchhiker etc. "Decision" is not a primitive concept, but an emergent one. The correct question to ask is "given an agent's actual actions (not thoughts, not decisions), what is the EV, and what kind of actions maximize it?" I wrote a detailed post about it, but it wooshed. People constantly and unwittingly try to smuggle libertarian free will in their logic.