I’ll accept no less than 50%, and fight if we disagree
Do you mean the commitments follow an "all or nothing" pattern, where if both sides commit to 51% they're doomed?
I imagine that commitments might be less extreme, where overlap is costly but not fatal:[1]
If each side commits to taking 51%, the rule of their commitment is to punish the other side by destroying anything more than 49% the other side takes, and then further destroying 0.5% for every 1% less than 51% they receive.
Each side takes 50%, but destroy 1% of the other side's pie so each side is only left with 49%. They both realize they received 2% less than the target 51%. This means each side destroys 1% of what the other side has, so each side now only has 48%. This is 3% less than 51%, so they further destroy 0.5% of what the other side has so they're both left with
It is important to punish the other side for punishing you, at least a little.
If you only destroy what they other side takes past 49%, but you do not destroy further based on how little you got, then the other side can get away with committing to take 70% of the pie, betting on the small chance you are a sucker and only ask for 30%. If they are correct, they get away with 70% of the pie. If they are wrong, then you will destroy their pie until they are left with only 49%, and they will destroy your pie until you are left with less than 30%, but they still get the maximum amount they could have gotten if they committed any lesser amount. This means it doesn't hurt for them to commit to take 70%. It only hurts you, and has a small chance of benefiting them.
PS: I'm not saying your post is wrong, since it's clearly titled "A high-level model of AI bargaining" rather than "A very detailed model of AI bargaining!" I just feel this detail is worth mentioning.
Advanced AIs might be capable of various credible commitments unavailable to humans, which they could use when bargaining with each other. “Bargaining” can sound like something pretty specific: haggling over (literal) prices. But, in the sense discussed in Schelling’s The Strategy of Conflict for instance, “bargaining” refers to any attempt to resolve a dispute over resources — from algorithmic trading and litigation, to diplomacy between national AGI projects and negotiations over norms for space settlement.
To think clearly about interventions to mitigate conflict between AIs, I think it’s important to ground our research and strategy in a very general qualitative model of bargaining with commitments. This post sketches such a model, plus some more concrete examples of its building blocks.
I plan to explain some crucial implications in future writings. But as a teaser, this model doesn’t imply agents will play a Nash equilibrium!
Model
Where does this model come from? Basically, I started with the classic model from open-source game theory or “program equilibrium” literature. Then, I relaxed several assumptions to allow for some realistic, strategically relevant dynamics. That said, I’ve glossed over some other important dynamics for ease of exposition. I’ll say more on these at the end of the post.
High-level setup
Two AI agents, Alice and Bob, interact over two phases: before vs. after some time T, defined below. It will help to start with the “after” phase.
Bargaining phase (after T): Alice and Bob bargain over some contested resource. Specifically, “bargaining” consists in credibly reporting to each other their (i) demands/offers and (ii) policies for which outside options they’d each take if bargaining failed (such as leaving the resource alone, or initiating conflict). Bargaining ends when they either:
How each agent decides (i) and (ii) is determined by some procedure chosen before the bargaining phase, as follows.
Pre-bargaining phase (before T): Each agent might try to shape the other’s incentives by credibly committing to constraints on their procedure for deciding (i) and (ii) — e.g., committing to never accept less than 50%. So they need to decide:
The agents make these decisions under uncertainty about each other’s decisions, though they can resolve some of this uncertainty via (2).
Now for more details on how these commitments to bargaining procedures might work, and on the three actions above.
Bargaining programs
Each agent’s procedure for what they’ll do in the bargaining phase, called a program, takes as input information about the other agent’s program, as well as other features of the strategic situation. [3] (See (2) below for an example of a relevant “feature of the strategic situation”.)
As a very simplified example, Alice might follow the program: “If I can prove that Bob’s program would eventually accept my demands if I stuck to them, then I’ll demand 100%. Otherwise, I’ll accept no less than 50%, and fight if we disagree.” So, the AIs can implement conditional commitments, instead of necessarily either locking in rigid demands or conceding to whoever commits first.
Then, T is the first time both agents know which single program each other has committed to. [4]
Pre-bargaining actions
At any time t < T, each agent can do one of three actions:
Concrete examples of aspects of this model
Commentary on the model’s assumptions
The model makes the simplifying assumptions listed below. None of these are trivial. But overall, I think it will be fruitful to start by working out the main implications of the model as-is, and relax these assumptions from there.
However, we don’t assume the agents:
This last point is worth a closer look. Indeed, I think dropping the equilibrium assumption is one of the most important starting points for a good theory of AI bargaining. But we’ll get to that in another post. [6]
Echoing the safe Pareto improvements agenda post: “Commitments” are meant to include modifications to one’s decision theory or values/preferences. It has been argued (example) that decision theories like updateless decision theory (UDT) can sidestep the need for “commitments” in the usual sense. We’ll set this question aside here, and treat the resolution to make one’s future decisions according to UDT as a commitment in itself. ↩︎
We define a commitment’s “credibility” relative to the set of agents the commitment needs to be made credible to. In some contexts, agents might want to make commitments that they can’t make credible to others. E.g., they might follow acausal decision theories and expect that if they commit to participate in evidential cooperation in large worlds (ECL), others are more likely to make the same commitment. These commitments are (vacuously) “credible”, because they don’t need to be made credible to anyone else. ↩︎
This is inspired by the “program game” formalism of Tennenholtz (2004), but my model isn’t committed to the specific assumptions in that paper — most notably, that players choose programs simultaneously. As described in the “Bargaining phase”, we allow for strategic decision-making to be carried out by the program itself, not just by the agent choosing the program. ↩︎
More generally, we could define each agent’s subjective T as the first time after which (a) that agent has decided a single program and (b) they know the other agent’s single program. But as far as I can tell, the implications of the model aren’t sensitive to this. ↩︎
As a point of contrast, the first five assumptions are made by this paper, which I nonetheless consider an important result in AI bargaining theory. ↩︎
Thanks to Nathaniel Sauerberg for helpful comments. ↩︎