In the last post, we saw an example of how dath ilan's Algorithm calls for negotiators to use non-credible threats to achieve fair outcomes when bargaining. The Algorithm also has a lot to say about threats more generally. In particular, that there is a type of threat that shouldn't appear in equilibria of ideal agents, because ideal agents don't give in to those kinds of threats. What sorts of threats should ideal agents make? And what sorts of threats should they give in to?

Threats in dath ilan

There is a word in Baseline, the language of dath ilan, which is simply translated as "threat". One connotation of this word is non-credibility.

Because, of course, if you try to make a threat against somebody, the only reason why you'd do that, is if you believed they'd respond to the threat; that, intuitively, is what the definition of a threat is.

The government of dath ilan does not use (non-credible) threats to enforce its laws. It only imposes those penalties which it has an incentive to actually impose, even if actually placed in a situation where a citizen has actually broken a law. (This is contrasted with a law which is not in a government's interest in that subgame to actually enforce, and therefore constitutes a non-credible threat.)

The dath ilani built Governance in a way more thoroughly voluntarist than Golarion could even understand without math, not (only) because those dath ilani thought threats were morally icky, but because they knew that a certain kind of technically defined threat wouldn't be an equilibrium of ideal agents; and it seemed foolish and dangerous to build a Civilization that would stop working if people started behaving more rationally.

So to recap, there is some notion of "threat", which should never appear in any real policy adopted by ideal agents. (Where a policy defines how an agent will behave in any situation.) Some credible threats, like exiling murderers, are compatible with the Algorithm. As are some non-credible threats, like refusing an unfair offer, even when doing so locally results in receiving less payoff than accepting.

There are also times when one agent simply has a lot of power over another, and can unilaterally dictate an unfair outcome. When another party can't fight back, a policy of taking their stuff is credible, but it certainly isn't nice.

Making and Responding to Threats

I suggest that a key feature is "if I were to choose a policy, which is a best-response to another agent's policy, would the resulting outcome be socially optimal?" This is an attempt to take the Algorithm's prescription for bargaining, and generalize it to arbitrary 2-player games.

When the other player has already chosen their policy, treat that policy like an offer. Best-responding to that offer is accepting. Rejecting and doing something else is a non-credible threat, which you should sometimes still do in order to incentivize the other player to make socially-good offers.

When proposing a policy that you would be willing to follow, or unilaterally deciding on one, select a policy which, if best-responded to, results in a socially optimal outcome. (According to your social choice theory.)

Note that this approach calls for treating other agents in a morally good way, even in games like "one agent dictates the split of resources and the other agent takes what they get." This could be compatible with abject selfishness, if we're willing to accept arbitrary definitions of terms like "morally good" and "fairness". For example, we could define fairness in terms of bargaining power, such that in the limit it's "fair" for powerless agents to get nothing. 

At the other extreme, we could also define fairness to be completely independent of bargaining power, such that agents with nothing to give us and nothing to withhold from us get exactly the same share of resources as if they did. It certainly seems nicer to treat other agents as if they have at least some bargaining power. It's how we would want to be treated if the power dynamics were reversed.

Moral Responsibility

It seems morally wrong to make software systems that are smart enough to identify the right thing to do, and the selfish thing to do, and then act selfishly in cases where those are misaligned.

In general, upon noticing that incentives are pushing agents towards socially sub-optimal outcomes, I believe that a reasonable Algorithm will call for those agents to reshape their collective incentives so that their individual interests are aligned with their collective interests. And along the way, when we notice that we are still in a position to do better for ourselves at the expense of the social good, we should do the right thing anyway because it's the right thing to do.

New to LessWrong?

New Comment