The heuristic that one should always resist blackmail seems a good one (no matter how tricky blackmail is to define). And one should be public about this, too; then, one is very unlikely to be blackmailed. Even if one speaks like an emperor.

But there's a subtlety: what if the blackmail is being used against a whole group, not just against one person? The US justice system is often seen to function like this: prosecutors pile on ridiculous numbers charges, threatening uncounted millennia in jail, in order to get the accused to settle for a lesser charge and avoid the expenses of a trial.

But for this to work, they need to occasionally find someone who rejects the offer, put them on trial, and slap them with a ridiculous sentence. Therefore by standing up to them (or proclaiming in advance that you will reject such offers), you are not actually making yourself immune to their threats. Your setting yourself up to be the sacrificial one made an example of.

Of course, if everyone were a UDT agent, the correct decision would be for everyone to reject the threat. That would ensure that the threats are never made in the first place. But - and apologies if this shocks you - not everyone in the world is a perfect UDT agent. So the threats will get made, and those resisting them will get slammed to the maximum.

Of course, if everyone could read everyone's mind and was perfectly rational, then they would realise that making examples of UDT agents wouldn't affect the behaviour of non-UDT agents. In that case, UDT agents should resist the threats, and the perfectly rational prosecutor wouldn't bother threatening UDT agents. However - and sorry to shock your views of reality three times in one post - not everyone is perfectly rational. And not everyone can read everyone's minds.

So even a perfect UDT agent must, it seems, sometimes succumb to blackmail.

New to LessWrong?

New Comment
33 comments, sorted by Click to highlight new comments since: Today at 10:14 PM

It's quite common for elderly patients' relatives to threaten me with all kinds of time consuming or reputation hurting bullshit like complaints involving a lot of paperwork or bad press in the local newspaper unless I wedge their relatively healthy granpa straight from the hospital past the line to a nursing home as fast as possible, instead of sending them back home where they'd do just fine. I haven't caved in and so far nothing bad has happened except some relatively harmless badmouthing. It's common for doctors to be blackmailed to do all kinds of stuff like unnecessary investigations or treatments.

You could say that in the US there have been enough trials and press gone badly for doctors that blackmailing often works. I'm glad that this isn't the case in Finland and we're still practicing cost effective medicine instead of covering our asses from all angles out of fear. The flip side of this is that incompetent doctors roam a bit too freely.

In a couple of cases I've made decisions that I've been blackmailed to make, not because of the blackmailing but because the blackmailer's interests happened to coincide with my medical reasoning. I find this problematic for signalling reasons.

I have made a prosecutor pale in the face by suggesting that courthouses should be places where people with plea bargains shop their offers around with each other so that they know what's a good deal and a bad deal.

what if the blackmail is being used against a whole group, not just against one person?

If the group is made up of UDT agents, then they clearly coordinate. If CDT agents are a small fraction of the group (assuming that transaction costs make perfect bargaining non-feasible for CDT agents, as usual), then UDT agents' (meta)-incentive to reject blackmail will be muted to some degree, depending on the fraction of CDT agents. The opposite consideration applies to the blackmailer's side: when faced with rejection, she has to expend resources on a costly punishment that will only affect the fraction of agents that's CDT. So her incentive to engage in blackmail in the first place rises as the fraction of UDT agents drops.

On a different note, assuming that the informational environment is favorable, the best response to "group blackmail" is probably not for each agent to reject blackmail individually, but for all agents to coordinate on incenting whomever can reject blackmail at lowest cost. Under this assumption, UDT agents will have an (meta-)incentive to incent rejection by any agents in their group, including CDT agents. But still, the main result is unchanged; as the fraction of UDT agents falls, the resources expended in providing such incentives will drop proportionally.

If this is a communal setting the logical step for the UDT agents is to coordinate and build a mutual blackmail prevention fond and clearly signal their membership. And I'd guess such a thing exists.

Only works if UDT agents make a significant proportion of agents in the setting. 10 UDT agents plus 1000 CDT agents, say, and the UDT agents are still vulnerable.

It also works if UDT agents can credibly distinguish themselves from non-UDT agents, whatever the proportions.

This requires not only that the UDT agents can reliably signal their UDT-ness to the blackmailers, but that the blackmailers can reliably signal to the non-UDTers that they can tell the difference. That is, letting the UDTers off might make the non-UDTers think that if they refuse the blackmail they'll also be let off.

So the ability of UDTers to resist blackmail depends not just on the properties of the UDTers and the blackmailers but also on those of the non-UDTers.

All y'all are assuming smart blackmailers.

The original example is of US prosecutors, right? I bet a standard prosecutor functions equivalently to a simple script:

if (pleads_guilty) { convict_reasonably() } else { throw_book() }

You can signal whatever you want to an agent executing this script, it's not going to care.

Right, the condition 'UDT agents can credibly distinguish themselves' sounds like a property of UDT agents but is actually a joint property of UDT agents and blackmailers.

That said, prosecutors ultimately follow that script because it works. I say 'ultimately' because it might be mediated by effects like 'they follow the script because they are rewarded for following it, and their bosses reward them for following it because it works'. The justice system is far from a rational agent, but it's also not an unincentivisable rock.

That said, prosecutors ultimately follow that script because it works

Yes, but note that here we are treating "works" as a binary variable and the presence of a minority of UDT agents in the target population is not going to switch "works" from true to false. In order for the prosecutors to care about signals, either a majority of the target population needs to credibly signal, or the throw_book() branch needs to have noticeable costs for prosecutors associated with it.

or the throw_book() branch needs to have noticeable costs for prosecutors associated with it.

It does, otherwise they would simply do it to all suspects.

otherwise they would simply do it to all suspects.

What makes you think they don't?

Courts are generally heavily booked, trials take forever, it's a perennial news issue that courts are underfunded (this seems to be a major factor behind the incredibly nasty and abusive rise in 'offender-funded' court systems & treating traffic violations & civil asset seizures as normal funding sources to be maximized) and I've seen estimates that as much as 90%+ of all cases resolve as plea bargains. There's no way the court system could handle a sudden 10-20x increase in workload, which is what would happen if prosecutors stopped settling for somewhat reasonable plea bargains and tried to throw the book at suspects who would then have little choice but to take it to trial.

(I recall reading about an attempt to organize defendants in one US court district to agree to not plea bargain, overloading the system so badly that most of the cases would have to be dropped; but I don't recall what happened and can't seem to refind it. I'm guessing it didn't work out, given that this is almost literally the prisoner's dilemma.)

Oh, sorry, I think I was unclear or probably even confusing. I didn't mean prosecutors actually just ship off all suspects to the courts with a long list of charges. I meant that they threaten everyone.

Obviously, a plea bargain makes things much easier for prosecutors so their usual goal is to obtain one. However if the accused is sufficiently stubborn, their choice is (a) to assemble a case and prosecute for a few charges; or (b) to assemble a case and prosecute for many charges. I don't think there is a major cost-to-prosecutors difference between (a) and (b) so they go for (b).

I didn't mean prosecutors actually just ship off all suspects to the courts with a long list of charges. I meant that they threaten everyone.

In that case, the argument you made here makes no sense.

Why is that?

You mean because prosecutors' incentives are mediated by the justice system, and the justice system has friction such that it won't react to a small change? Makes sense.

The extent to which this is actually true is a complicated factual question about the US justice system.

Agreed. But still less so than before.

I was thinking about this question in regards to whether CDT agents might have a simply bayesian reason to mimic UDT agents, not only in any pre-decision signaling, but also in the actual decision. And I realized an important feature of these problems is that the game ends precisely when the agent submits a decision, which highlights the feature of UDT that distinguishes its cooperation from simple bayesian reasoning: A distinction that becomes important when you start adding qualifiers that include unknowns about other agents' source codes. The game may have as many confounders and additional decision steps before the final step, but UDT is exclusively the feature that allows cooperation on that final step.

A UDT+ agent would clearly communicate her resistance to blackmail and cause the blackmailer to pick an easier target.

In the case above - no. Without definite ways of distinguishing UDT+ agents from other agents, they would still get blackmailed, lest other agents try and pretend to be UDT+ agents.