# 5

Consider prisoner's dilemma with perfect deterministic twins.

Typically you assume that there's a 100% perfect clone, and that you have 100% perfect knowledge of the fact that they're a perfect clone. These two enable you to arrive at common knowledge about each other and everything about the situation. (Check out blue eyes for a primer on iterating over knowledge, and common knowledge)

What happens when there's a clone with 99% physical similarity, and you have knowledge with 99% certainty they're a clone with 99% similarity to you? Cause this is what happens in the real world, I'm trying to figure out whether these decision theories cooperate in real world situations, not just hypothetical ideal ones.

In the 99%-99% situation, each iteration of knowledge might have diminishing weight.

Alice: "There's a 99% probability the other person is 99% similar to me."

Alice: "99% similarity indicates Bob is 99.4% (assumed) likely to be capable of reasoning one iteration up. There's a 99.4% probability that Bob assigns a 99% probability to me being 99% similar to him."

Bob: "99% similarity indicates Alice is 99.4% (assumed) likely to be capable of reasoning one iteration up. There's a 99.4% probability that Alice assigns a 99% probability to me being 99% similar to her."

Alice: "99% similarity indicates Bob to 99.4% (assumed) likely to be capable of reasoning the second iteration. There's a 99.3% probability that Bob assigns a 99.4% probability that I assign a 99% probability to me being 99% similar to her."

...

I feel these probabilities also impact the actual decision to cooperate / defect, but unable to formalise it. Do they matter?

New Comment

Charlie Steiner

### Oct 28, 2021

5

Sort of? Certainly if you have a probability distribution over algorithms the other person can be running, and you can do proof-based let's-call-it-TDT in the case of each algorithm, then you can do TDT to the probability distribution.

I think a more serious problem that might be implicated here is embedded agency. When we have information about a physical system, translating this into algorithms does not land us in exactly the epistemic situation of having a distribution over algorithms - multiple algorithms can all approximate the same system. And if you yourself are a finite physical system (and thus unable to ignore abstraction and plow ahead by pure prediction of sense data), this becomes important.

It's not like we have no options - you can still pick some way to representing the abstraction operation, and some way of calculating an expected-utility-analogue (e.g. by summing over some set of hypotheses without discrimination), and do something that is recognizably TDT-ish. But it's a fairly reasonable complication you might be thinking of if you want to be able to say that you and me are similar.

Yup this makes a ton of sense.

Probably why human societies often pro-actively seek out and are wary of humans whose values cause them to deviate from norm - be it due to depression or narcissism or sociopathy or whatever. Need to be assured of certain baseline similarity to coordinate towards societal goals. And detecting deviations is hard.

7 comments, sorted by Highlighting new comments since

IDK how this plays in to things, but convergence seems relevant. It's not clear what "similar" means here. Many kinds of difference wash out. E.g. Solomonoff inductors with different priors will converge (if the universe is computable). Proof-based cooperation is also sort of robust. Like, in real life, states don't reason about other states by any sort of detailed similarity, but rather assume that the other state wants certain convergent goals such as not being consumed in a ball of fire. It's weird because that argues that maybe cooperation is robust; but also it feels like there's good reasons for cooperation to be fragile. E.g. the argument you give. Or more empirically, ambiguity about whether you need to prepare for adversarial situations makes you prepare for adversarial situations, which creates ambiguity about whether the other needs to do likewise.

Newcomb-like problems are common in real life. This seems to suggest that a high fidelity is not needed.

Thanks for replying, but I'm not sure I understood you. I am aware of most of these coordination problems, and that humans often successfully solve them, that does not mean they use TDT to solve it. Most people have never heard of TDT or the rationalist community.

I was surprised when I learned that Newcomb-like problems are common in real life.

It seems reasonable to compare a decision theory that tries to solve these to whatever intuitive means people use to solve them.

Yeah it is reasonable but I don't see people intuitively thinking "these beings are similar to me so I can causally impact their decisions just by doing X".

People simulate other people in their minds. They don't need to think 'they are similar to me". Simulating them in a way that is close to how they think themselves may be enough.

If the other person is not similar to me, then TDT doesn't apply, right? Say I simulate their action. I am now allowed to act different from the other person, and my best move is doing so if conventional game theory suggests it. (CDT)