Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Let be a list of modal agents as described in Robust Cooperation in the Prisoner’s Dilemma: Program Equilibrium via Provability Logic. If a modal agent is going to play against each of these agents exactly once in the prisoners dilemma, what is the greatest possible score that can achieve? How can one find a modal agent which achieves that score? This is the first in a series of posts I will make investigating how to answer that question.

We start by turning this into a decision problem. Given two lists of modal agents as input, , and , we ask whether or not there exists an which defects against and gets to cooperate with it. We call this decision problem Modal SAT.

In the special case where and , we call this decision problem Exploit.

We will also look at two variations of Modal SAT. In SC Modal SAT, we additionally require that cooperates with itself. In NE Modal SAT, we require that the agent is non exploitable.

Note that in these decision problems, we do not require actually constructing the agent , but all of the algorithms which we plan to discuss will be constructive.

Further, note that the original optimization problem does not reduce obviously in polynomial time to Modal SAT. This is because for a given list of agents and a given score, there could be exponentially many ways to cooperate and defect with the agents to achieve that score. We do however have an exponential time reduction, and we might be able to construct a polynomial time reduction.

As we will soon see, Modal SAT is NP-Hard. However, we believe that we can construct a program that can solve practical instances of Modal SAT quickly. Mihaly Barasz is working on writing up such a program, while I write up some theoretical results about the problem. I suspect that I will be able to prove that all for decision problems are NP-Complete (and therefore in NP), but have not gone through all the details. I further expect to show that Modal SAT and SC Modal SAT are actually the same problem. (no reduction necessary)

I am uncertain what order I will make posts in this series, and I am further uncertain whether or not I will get to all the posts that I plan to. This series is not very high priority for me. As I make new posts, I will post links here.

  1. Introduction
  2. Self Cooperation
New Comment