Jim Buhler

I am the main organizer of Effective Altruism Cambridge (UK), a group of people who are thinking hard about how to help others the most and address the world’s most pressing problems through their careers.

Previously, I worked in organizations such as EA France (community director), Existential Risk Alliance (research fellow), and the Center on Long-Term Risk (events and community associate). 

I've conducted research on various longtermist topics (some of it posted on the EA Forum) and recently finished a Master's in moral philosophy.

I've also written some stuff here on LessWrong.

You can give me anonymous feedback here. :)


Wiki Contributions


I don't know and this is outside the scope of this post I guess. There are a few organizations like the Center on Long-Term Risk studying cooperation and conflict between ASIs, however.

Interesting, thanks! This is relevant to question #2 in the post! Not sure everyone should act as if they were the first considering the downsides of interciv conflicts, but yeah, that's a good point.

Jim Buhler2-1

Oh nice, thanks for this! I think I now see much more clearly why we're both confused about what the other thinks. 

Say Alice has epistemic algorithm A with inputs x that gives rise to beliefs b and Bob has a completely different algorithm A' with completely different inputs x' that happens to give rise to beliefs b as well. Alice and Bob both use decision algorithm D to make decisions. Part of b is the belief that Alice and Bob have the same beliefs and the same decision algorithm. It seems that Alice and Bob should cooperate.

(I'll respond using my definitions/framing which you don't share, so you might find this confusing, but hopefully, you'll understand what I mean and agree although you would frame/explain things very differently.)

Say Bob is CooperateBot.  Alice may believe she's decision-entangled with them, in which case she (subjectively) should cooperate, but that doesn't mean that their decisions are logically dependent (i.e., that her belief is warranted). If Alice changes her decision and defects, Bob's decision remains the same.  So unless Alice is also a CooperateBot, her belief b ("my decision and Bob's are logically dependent / entangled such that I must cooperate") is wrong. There is no decision-entanglement.  Just "coincidental" mutual cooperation. You can still argue that Alice should cooperate given that she believes b of course, but b is false. If only she could realize that, she would stop naively cooperating and get a higher payoff.

So it seems that the whole A,x,A',x' stuff just doesn't matter for what they should do. It only matters what their beliefs are. 

It matters what their beliefs are to know what they will do, but two agents believing their decisions are logically dependent doesn't magically create logical dependency.  

If I play a one-shot PD against you and we both believe we should cooperate, that doesn't mean that we necessarily both defect in a counterfactual scenario where one of us believes they should defect (i.e., that doesn't mean there is decision-entanglement / logical dependency, i.e., that doesn't mean that our belief that we should cooperate is warranted, i.e., that doesn't mean that we're not two suckers cooperating for wrong reasons while we could be exploiting the other and avoid being exploited). And whether we necessarily both defect in a counterfactual scenario where one of us believes they should defect (i.e., whether we are decision-entangled) depends on how we came to our beliefs that our decisions are logically dependent and that we must cooperate (as illustrated -- in a certain way -- in my above figures).

(Of course, you need to have some requirement to the extent that Alice can't modify her beliefs in such a way that she defects but that she doesn't (non-causally) make it much more likely that Bob also defects. But I view this as an assumption about decision-theoretic not epistemic entanglement: I don't see why an epistemic algorithm (in the usual sense of the word) would make such self-modifications.). 

After reading that, I'm really starting to think that we (at least mostly) agree but that we just use incompatible framings/definitions to explain things. 

Fwiw, while I see how my framing can seem unnecessarily confusing, I think yours is usually used/interpreted oversimplistically (by you but also and especially by others) and is therefore extremely conducive to Motte-and-bailey fallacies[1] leading us to widely underestimate the fragility of decision-entanglement. I might be confused though, of course.

Thanks a lot for your comment! I think I understand you much better now and it helped me reclarify things in my mind. :)

  1. ^

    E.g., it's easy to argue that widely different agents may converge on the exact same DT, but not if you include intricacies like the one in your last paragraph.

Interesting! Did thinking about those variants make you update your credences in SIA/SSA (or else)? 

(Btw, maybe it's worth adding the motivation for thinking about these problems in the intro of the post.) :)

Thanks a lot for these comments, Oscar! :)

I think something can't be both neat and so vague as to use a word like 'significant'.

I forgot to copy-paste a footnote clarifying that "as made explicit in the Appendix, what "significant" exactly means depends on the payoffs of the game"! Fixed. I agree this is vague, still, although I guess it has to be since the payoffs are unspecified?

In the EDT section of Perfect-copy PD, you replace some p's with q's and vice versa, but not all, is there a principled reason for this?  Maybe it is just a mistake and it should be U_Alice(p)=4p-pp-p+1=1+3p-p^2 and U_Bob(q) = 4q-qq-q+1 = 1+3q-q^2.

Also a copy-pasting mistake. Thanks for catching it! :) 

I am unconvinced of the utility of the concept of compatible decision theories.  In my mind I am just thinking of it as 'entanglement can only happen if both players use decisions that allow for superrationality'. I am worried your framing would imply that two CDT players are entangled, when I think they are not, they just happen to both always defect.

This may be an unimportant detail, but -- interestingly -- I opted for this concept of "compatible DT" precisely because I wanted to imply that two CDT players may be decision-entangled! Say CDT-agent David plays a PD against a perfect copy of himself. Their decisions to defect are entangled, right? Whatever David does, his copy does the same (although David sort of "ignores" that when he makes his decision). David is very unlikely to be decision-entangled with any random CDT agent, however (in that case, the mutual defection is just a "coincidence" and is not due to some dependence between their respective reasoning/choices).  I didn't mean the concept of "decision-entanglement" to pre-assume superrationality. I want CDT-David to agree/admit that he is decision-entangled with his perfect copy. Nonetheless, since he doesn't buy superrationality, I know that he won't factor the decision-entanglement into his expected value optimization (he won't "factor in the possibility that p=q".) That's why you need significant credence in both decision-entanglement and superrationality to get cooperation, here. :)

Also, if decision-entanglement is an objective feature of the world, then I would think it shouldn't depend on what decision theory I personally hold.  I could be  CDTer who happens to have a perfect copy and so be decision-entangeled, while still refusing to believe in superrationality.

Agreed, but if you're CDTer, you can't be decision-entangled with an EDTer, right? Say you're both told you're decision-entangled. What happens? Well, you don't care so you still defect while EDTer cooperates.  Different decisions. So... you two weren't entangled after all. The person who told you you were was mistaken. 
So yes, decision-entanglement can't depend on your DT per se, but doesn't it have to depend on its "compatibility" with the other's for there to be any dependence between your algos/choices? How could a CDTer and an EDTer be decision-entangled in a PD?

Not very confident about my answers. Feel free to object. :) And thanks for making me rethink my assumptions/definitions!