Yet More Modal Combat

Interesting.

I'm worried about this bot (I'll call it EsotericBot, EB, since it only Cooperates if you Cooperate but not obviously). If EB cooperates with FB_T (FairBot using theory T), then T must prove that W is consistent: EB(FB_T) implies FB_T(EB), which implies T proves not W proves FB_T(EB), which implies W is consistent. This might just be a detail of your definition that could be fixed with something like what's done in PrudentBot; it doesn't look like it at a glance, but I'm not sure. Or you might want to bite that bullet and lose cooperation with these "weak" FairBots, but that seems surprising, and it seems like we've just gone too far with defecting against people who obviously cooperate.

>The hypothetical opponent may notice this apparent anomaly and decide investigating it is far more important than the prisoners dilemma.

This doesn't show up in the formal model. Do you mean, if we're thinking about AIs in general as they might look if the reason with the same spirit as these modal bots? I'm not sure why this is a constraint you want, it seems very constraining. One thing we could try is to construct plausible counterfactuals by conditioning, not causal-counterfacting, that our opponent is playing against some other bot. In the causal counterfactual the AI would see "please welcome into the Ring... ComplexityBot, fighting FairBot!" and then FairBot goes in, but instead of ComplexBot sees just a DefectBot, and is like "um what?". Instead if we condition on "FairBot fights DB", that updates the ring announcer and lots of other stuff, hopefully so that FB doesn't notice the difference from really fighting DB. This seems vaguely analogous to just asking whether FB(DB); asking FB(DB) doesn't seem to function in modal bots by in any sense counterfacting the current situation on "I'm actually DB", it just asks what FB does in this other situation.

>Although I suspect this can be weakened somewhat.

If we're looking at, say, Boolean combinations of modal statements, this translates to $Δ_{2}$ I think, so we can just ask for $Δ_{2}$ -soundness.

>proof searches in the same language (say PA) but with S having a much larger max proof length

This paper is relevant:

Parametric Bounded Löb's Theorem and Robust Cooperation of Bounded Agents by Andrew Critch https://arxiv.org/abs/1602.04184

[-]Adrià Garriga-alonso4y30

Good design! A name for it could be the TemptedBot since it tries to go for the temptation payoff, or the ExploitBot (short: Explobot) since it tries to exploit the opponent.

One thing that you did not get around to writing is that if the Explobot's weak system is W=PA, and it plays against a FairBot that uses PA, the bots will play defect-defect. This is because the FairBot cannot prove that the Explobot takes the first "else" branch, and thus cannot prove that the Explobot cooperates. Then the FairBot defects, and as a consequence so does the Explobot.

[-]Diffractor4yΩ120

Any idea of how well this would generalize to stuff like Chicken or games with more than 2-players, 2-moves?

[-]Donald Hobson4y20

Not yet. I'll let you know if I make a follow-up post with this. Thanks for a potential research direction.

[-]TekhneMakre4y10

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

16

Yet More Modal Combat

16

Ω 9

16

Ω 9