Let's suppose that you look at the code of your counterparty which says "I'll Nuke you unless you Cooperate in which case I Defect", call it "extortionist". You have two hypotheses here:

Your counterparty deliberately modified its decision-making procedures in hope to extort more utility;
This decision-making procedure is a genuine result of some weird evolutionary/learning process.

If you can't actually get any evidence in favor of each hypothesis, you go with your prior and do whatever is the best from the standpoint of UDT/FDT/LDT counterfactual operationalization. I.e., let's suppose payoffs are:

Cooperate:Cooperate - 10:10
Cooperate:Defect - ~~20:2~~ 15:2
Defect:Defect - 5:5
Nuke - -100:-100.

You are playing against extortionist counterparty. Prior probability of extortionist from hypothesis 2 is x. Extortionists from hypothesis 1 can perfectly predict your responce in their decision to self-modify and cover tracks. If they decide to not self-modify, they choose to cooperate conditional on your cooperation. Let's call policy "Nuke extortionist, cooperate with non-extortionist" and "Cooperate with both" $π_{2}$

From here, your UDT-expected utility is:

$E U (π_{1}) = - 100 x + 10 (1 - x)$

$E U (π_{2}) = 2$

Therefore, you should choose $π_{1}$ if

$- 100 x + 10 (1 - x) > 2$

i.e.

$x < \frac{4}{55} \approx 7.27 %$

And 7.27% is a really high frequency of "natural" extortionists, I won't expect it to be this high.

[-]Oskar Mathiasen7mo62

Minor note. Your choice of utilities makes a 50/50 mixture of Cooperate:Defect and Defect:Cooperate better than the Cooperate:Cooperate outcome. So Cooperate:Cooperate isnt on the pareto frontier.

Reply

1

[-]CronoDAS7mo20

Does any process in which they ended up the way they did without considering your decision procedure count as #2? Like, suppose almost all the other agents it expects to encounter are CDT agents that do give in to extortion, and it thinks the risk of nuclear war with the occasional rock or UDT agent is worth it.

Reply

2quetzal_rainbow7mo

Given this particular setup (you both get source codes of each other and make decision simultaneously without any means to verify choices of counterparty until outcomes happen), you shouldn't self-modify into extortionist, because CDT agents always defect, because no amount of reasoning about source code can causally affect your decision and D-D is Nash equilibrium. CDT agents can expect with high probability to meet extortionist in the future and self-modify into weird Son-of-CDT agent, which gives in to extortion, but for this setup to work in any non-trivial way you should be at least EDT-ish. But yes, general principle here is "evaluate how much other player decision procedure is logically influenced by my decision procedure, calculate expected value, act accordingly". The same is true for situation when you decide about self-modification. For example, if you think that modifying into extortionist is a good policy, it can lead to situation where everyone is extortionist and everybody nukes each other.

Ape in the coat

May 13, 2025

31

The optimal strategy seems to be Prudent Extorter:

Extort agents vulnerable to extortion
Cooperate with agents that are not vulnerable to extortions and will cooperate back.
Defect against everyone else.

Such agents perform better than your Naive Extorter as they would be able to cooperate with each other and do not nuke a Defection Rock.

When Naive Extorter meets Prudent Extorter they die in a nuclear fire.

1

CronoDAS

May 14, 2025

2-1

I actually have found an example of a strategy that doesn't incentivize someone else to self-modify into Hawkbot: https://www.lesswrong.com/posts/TXbFFYpNWDmEmHevp/how-to-give-in-to-threats-without-incentivizing-them

Basically, when you're faced with a probable extorter, you play Cooperate some of the time (so you don't always get nuked) but either Defect or Nuke back often enough that Hawkbot gets a lower expected value than Cooperate/Cooperate.

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

11

[ Question ]

Game theory of "Nuclear Prisoner's Dilemma" - on nuking rocks

11

11

3 Answers sorted by
top scoring

May 12, 2025*

May 13, 2025

May 14, 2025

11

[ Question ]

Game theory of "Nuclear Prisoner's Dilemma" - on nuking rocks

11

11

3 Answers sorted by top scoring

May 12, 2025*

May 13, 2025

May 14, 2025

3 Answers sorted by
top scoring