Unpacking the Concept of "Blackmail"

Keep in mind: Controlling Constant Programs, Notion of Preference in Ambient Control.

There is a reasonable game-theoretic heuristic, "don't respond to blackmail" or "don't negotiate with terrorists". But what is actually meant by the word "blackmail" here? Does it have a place as a fundamental decision-theoretic concept, or is it merely an affective category, a class of situations activating a certain psychological adaptation that expresses disapproval of certain decisions and on the net protects (benefits) you, like those adaptation that respond to "being rude" or "offense"?

We, as humans, have a concept of "default", "do nothing strategy". The other plans can be compared to the moral value of the default. Doing harm would be something worse than the default, doing good something better than the default.

Blackmail is then a situation where by decision of another agent ("blackmailer"), you are presented with two options, both of which are harmful to you (worse than the default), and one of which is better for the blackmailer. The alternative (if the blackmailer decides not to blackmail) is the default.

Compare this with the same scenario, but with the "default" action of the other agent being worse for you than the given options. This would be called normal bargaining, as in trade, where both parties benefit from exchange of goods, but to a different extent depending on which cost is set.

Why is the "default" special here? If bargaining or blackmail did happen, we know that "default" is impossible. How can we tell two situations apart then, from their payoffs (or models of uncertainty about the outcomes) alone? It's necessary to tell these situations apart to manage not responding to threats, but at the same time cooperating in trade (instead of making things as bad as you can for the trade partner, no matter what it costs you). Otherwise, abstaining from doing harm looks exactly like doing good. A charitable gift of not blowing up your car and so on.

My hypothesis is that "blackmail" is what the suggestion of your mind to not cooperate feels like from the inside, the answer to a difficult problem computed by cognitive algorithms you don't understand, and not a simple property of the decision problem itself. By saying "don't respond to blackmail", you are pushing most of the hard work into intuitive categorization of decision problems into "blackmail" and "trade", with only correct interpretation of the results of that categorization left as an explicit exercise.

(A possible direction for formalizing these concepts involves introducing some kind of notion of resources, maybe amount of control, and instrumental vs. terminal spending, so that the "default" corresponds to less instrumental spending of controlled resources, but I don't see it clearly.)

(Let's keep on topic and not refer to powerful AIs or FAI in this thread, only discuss the concept of blackmail in itself, in decision-theoretic context.)

136 comments, sorted by
magical algorithm
Highlighting new comments since Today at 1:53 PM
Select new highlight date

I wonder if this question is related to the revulsion many people feel against certain kinds of price discrimination tactics. I mean things like how in the 19th century, train companies would put intentionally uncomfortable benches in the 3rd class carriages in order to encourage people to buy 2nd class tickets, or nowadays software that comes with arbitrary, programmed-in restrictions that can be removed by paying for the "professional" version.

People really don't like that! It seems like there is some folk-ethics norm that "if you can make me better off with no effort on your part, then you have an obligation to do so", which seems like part of a "no blackmail" condition.

That makes sense from a reciprocal altruism perspective. If someone can benefit you at no cost to themself, and doesn't, that probably indicates a lack of intent to cooperate under all circumstances. The natural response is hostility.

In Bombay, the only difference between first- and second- class cars is the price. The second-class cars are more crowded. I've been trying to think of a nice analogy to blackmail but didn't.

It seems that we cry blackmail when a shelling point already exists, and the other agent is threatening to force us below it. The moral outrage functions as a precommittment to punish the clear defection.

In normal human life, 'do nothing' is the schelling point, because most people don't interact with most people. But sometimes the schelling point does move, and it seems what constitutes blackmail does too: if a child's drowning in a pond, and I tell you I'll only fish him out if you give me $1,000, it seems like I'm blackmailing you.

Sometimes both sides feel like they're being blackmailed though; like when firefighters go on strike, and both city hall and the union accuse the other of endangering people. Could this be put down to coordination problems?

if a child's drowning in a pond, and I tell you I'll only fish him out if you give me $1,000, it seems like I'm blackmailing you.

Perhaps a borderline case like this is most helpful. Is this extortion? Even though the default case in this case isn't 'doing nothing'. The default is saving the child. Because that is what someone should do.

So maybe the word is difficult to unpack because it has morality behind it. A person shouldn't bomb your car, and shouldn't expose your private secrets. On the other hand, they needn't give you food, so it's OK to ask for money for that.

If I demand money for being faithful to my husband, than that is extortion because I'm supposed to be faithful. If, however, I want a divorce and would divorce him, I'm allowed to let him pay me for faithfulness. Such gray areas indicate to me that it is indeed about some notion of expected/moral behavior.

Selling food to starving families -- when they become so poor that you ought to give them food for free -- then that is extortion.

So: demanding more compensation when you should do it for less (or demanding any when you should do it for free).

Don't even get me started on how ill-defined and far from being formally understood the concept of "Schelling point" is. It's very useful in informal game theory of course.

Yeah, I'm reading Strategy of Conflict at the moment. Still, it seems that working out Schelling points would give us blackmail, whilst understanding blackmail some other way wouldn't give us schelling points (as the latter can be without communication, etc.)

A Schelling point is a kind of Nash equilbrium, right? It's the kind of equilibrium that an understanding of human psychology and the details of the situation says you should expect.

The union-firefighter looks like a variant on the hawk-dove/chicken game. If default is (Dove,Dove), which isn't an equilibrium, Hawk can be seen as a black mail action as it makes you worse of than default. So at (Hawk,Hawk) everyone is, in fact, being blackmailed, and this is, essentially, a coordination problem.

This is a clever idea, but I don't think it works: you need to unpack the question of why a decision algorithm would deem cooperation non-optimal, and see if it coincides with a special class of problems where cooperation is generally non-optimal.

So I think what gets an offer labeled as blackmail is the recognition that cooperation would lead the other party to repeatedly use their discretion to force my next remaining options to be even worse. So blackmail and trade differ in that:

  • If I cooperate wth a blackmailer, they are more likely to spend resources "digging up dirt" on me, kidnapping my loved ones, etc. I don't want to be in that position, regardless of what I decide to do then.
  • If I trade with a trade-offerer, they are more likely to spend resources acquiring goods that I may want to trade for. I do want to be in the position where others make things available to me that I want (except for where I'd be competing with them in that process.)

And yes, these two situations are equivalent, except for what I want the offerer to do, which I think is what yields the distinction, not the concept of a baseline in the initial offer.

You can phrase blackmail as a sort of addiction situation where dynamic inconsistency potentially leaves me vulnerable to exploitation. My preferences at any time t are:

1) Not have an addiction.
2) Have an addiction, and take some more of the drug.
3) Have an addiction, and not take the drug.

where I'm addicted at time t, and taking the drug will make me addicted in time t+1 (and i otherwise won't be addicted in t+1).

In this light, one can view the classification of something as blackmail, as being any feeling or mechanism that makes me choose 3) over 2). "2 looks appealing, but I feel a strong compulsion to do 3." Agents with such a mechanism gain a resistance to dynamic inconsistency.

In contrast, if "addiction" were good, and the item in 1) were moved below 3) in my preference ranking, then I wouldn't benefit from a mechanism that makes me choose 3 over 2. That would feel like trade.

And yes, these two situations are equivalent, except for what I want the offerer to do, which I think is what yields the distinction, not the concept of a baseline in the initial offer.

Yes, the distinction is in the way you prefer to acausally observation-counterfactually influence the other player. Not being offered a trade shouldn't be considered irrelevant by your decision algorithm, even if given the observations you have it is impossible. Like in Counterfactual Mugging, but with the other player instead of a fair coin. Newcomb's with transparent boxes is also relevant.

Like in Counterfactual Mugging, but with the other player instead of a fair coin. Newcomb's with transparent boxes is also relevant.

Exactly, which is why I consider the hazing problem to be isomorphic to CM, and akrasia to be a special case of the hazing problem.

Time-inconsistency seems unrelated. It may be a problem in implementing the strategy "don't respond to blackmail", but one can certainly TRY to blackmail a time-consistent person, if one believes them to be irrational or if they have only one blackmail-worthy secret.

I know this isn't quite rigorous, but if I can calculate the counterfactual "what would the other player's strategy be if ze did not model me as an agent capable of responding to incentives," blackmail seems easy to identify by comparison to this.

Perhaps this can be what we mean by 'default'?

I think this ties into Larks' point -- if Larks didn't think I responded to incentives, I think ze'd just help the child, so asking me $1,000 would be blackmail. Clippy would not help the child, and so asking me $1,000 is trade.

To first order, this means that folks playing decision-theoretic games against me actually have an incentive to self-modify to be all-else-equal sadistic, so that their threats can look like offers. But then I can assume that they would not have so modified in the first place if they hadn't modelled me as responding to incentives, etc. etc.

"an agent incapable of responding to incentives" is not a well-defined agent. What do you respond to? A random number generator? Subliminal messages? Pie?

Pie?

I respond to pie. Are you offering pie?

Should you find yourself in the greater Boston area, drop me a line and I will give you some pie.

(I suspect that there is a context to this comment, and I might even find it interesting if I were to look it up, but I'm sort of enjoying the comment in isolation. Hopefully it isn't profoundly embarrassing or anything.)

Can I take you up on that as well? You can never have too much pie.

Well, you're certainly free to drop me a line if you're in the area, but I'm far less likely to respond, let alone respond with pie.

Which option for you is "not responding", the "default"? Maybe you give away $1000 by default, and since that leads to children not drowning, the better-valued outcome, it looks more like "least effort". How do you measure effort?

I really wish "blackmail" were not used to mean extortion.

I had the same reaction, thinking blackmail is a special form of extortion in which the threat is a threat of exposure. But when I sought support from the dictionary, I was disappointed

Dictionaries are histories of usage; not arbiters of meaning. If they were, language would not change in meaning (only add new words) from the moment the first dictionaries were made.

See here

That is surprising. It seems that using 'blackmail' to refer to extortion isn't even a corruption of the original use.

Indeed, we have this account of the etymology from George MacDonald Fraser's The Steel Bonnets:

Deprived of the protection of law, neglected by his superiors, and too weak to resist his despoilers, the ordinary man's only course was the payment of blackmail. This practice is probably as old as time, but the expression itself was coined on the Borders, and meant something different from blackmail today. Its literal meaning is "black rent" --- in other words, illegal rent -- and its exact modern equivalence is the protection racket.

Blackmail was paid by the tenant or farmer to a "superior" who might be a powerful reiver, or even an outlaw, and in return the reiver not only left him alone, but was also obliged to protect him from other raiders and to recover his goods if they were carried off.

Note that he does consider the modern meaning to be more specialized.

They are certainly used synonymously often enough to get into the dictionary that way. I didn't say it was wrong, I said I wish it weren't used that way.

If you have a case for why it is bad for 'blackmail' to mean 'extortion' (ie you can demonstrate that precision is desirable or something) then make the case. If it's a good case (I expect it will be; 4 karma points on a new-ish article at time of this comment suggests it is widely recognised) then people - most definitely me included - will start making the distinction you wish for.

(This is how language - prevailing terminology - changes! Ain't it cool?)

In general, I think synonyms are bad. It's a waste of vocabulary to have two words that mean the same thing in the same language unless there is something meaningfully different about them (connotation, scope, flavor, nuance, something). When "blackmail" just means "extortion", and not a kind of extortion (the threat to reveal incriminating information), the words become synonyms, instead of one of them being a special case of the other.

Yes, I have a similar rule. "Disinterested" has been used to mean "uninterested" for all of its history IIRC, but I support efforts to stop using it that way and keep it for its distinct meaning of "with no stake in the outcome" because synonyms are wasteful.

I agree in principle, but in practice I fudge this when the meaning is clear from context, because I hate the rhythm of "uninterested". (I use "not interested" instead when I can, but sometimes it sounds more graceful to use "disinterested", and sometimes I do it. Maybe I should try harder to stop.)

Agreed. From now on I will use blackmail to refer to extortion involving the threat to reveal incriminations, and if I encounter confusion, I will either direct them to this discussion or use rhetoric / appeal to my own authority to convince them of the truth of my position, depending on which I judge to have the better chance of actually convincing them.

Sorry to be so formal and spell it all out, but I just recently worked this unconscious process out and I am bursting with enthusiasm to share it!

(Note that the field of linguistics uses the phrase 'perfect synonym' to refer to what you mean by synonym, and when they say synonym they allow possible variances of nuance. Note also that I think their definitions are not in touch with the definitions for 'synonym' that people actually use, so more fool them.)

So "synonym" in common usage is an perfect synonym for "perfect synonym"?

Sorry to be so formal and spell it all out, but I just recently worked this unconscious process out and I am bursting with enthusiasm to share it!

Not at all, it's nifty. I'm sort of tickled to have discovered someone who will use words how I want them if I explain why they should.

Do most people not do that? In my experience if I tell people not to do certain things (as long as the things aren't too ridiculous-- I have no expectation that anyone would stop breathing because KATYDEE COMMANDS IT), they stop doing those things, or at least stop doing them around me. There are some irritating exceptions-- the number of people who respond "Why?" to "Be quiet" or "Don't talk to me" is staggeringly high-- but by and large people tend to respect such preferences in my experience.

I wouldn't have been uncommonly impressed if shokwave had agreed to use "blackmail" and "extortion" as I prefer while talking to me (although the local context makes that sort of acquiescence less likely than it would be in most social groups, I think). But the great-grandparent seems to indicate a commitment to use the words the way I like them in all contexts and to go so far as to evangelize my linguistic beliefs.

Do most people not do that?

Most people will indeed adopt different terminology, given a good reason; it's just that some people have extensive experience of others not complying with such requests because the reasons are ridiculous, and then infer such rejection to be a more general phenomenon.

Example:

A: [Activity X] will tend to make you more sexually attractive to [group Y] because of [mechanism Z].
B: You shouldn't say that because it's offensive to Ys and treats them like non-persons mindlessly responding to X, and I don't like that. And I don't like X, either.
C: Are you insane? I can't ignore real-world social phenomena that affect my life like what A described, just because it offends you and you have unusual preferences. Try to think about how others might feel.
B: Bah! Blast these terrorists who won't listen to the voice of reason! Where can I find less defective people?

Note that the field of linguistics uses the phrase 'perfect synonym' to refer to what you mean by synonym, and when they say synonym they allow possible variances of nuance.

Anyway, it depends on how much variance of nuance you want to allow. (Does the fact that extortion is Latinate and blackmail is Germanic count for anything?) I've seen a claim that no language has truly perfect synonyms (i.e. two words such that P(X|someone says word1) = P(X|someone says word2) for all X in all circumstances), which might well be true, but which would make the phrase perfect synonym useless.

The default is special because it costs the other person time/money/effort to do anything other than the default.

Hence, not blowing up your car is the default, but so is not giving you food.

I feel like what people call blackmail is largely related to intentionality. The blackmailer goes out of their way to harm you should you not cooperate.

In the trade example, whereas if someone wants to trade and you don't, and you need the object but don't trade, we don't blame that on the other person trying to harm you.

When you are presented with blackmail, or with trade, the "default" is not what actually happens, it's impossible, might be as well logically counterfactual (and you know that, even if the other agent can't). If all we know is that it's counterfactual, then we might as well consider "non-default" everything that has opportunity cost compared to the equally counterfactual "default" of Flying Spaghetti Monster granting you $1000.

The person blackmailing you doesn't have the option of having the FSM grant them $1000

They do have the option of not blackmailing you.

Just because they are blackmailing you doesn't make them not blackmailing you impossible. If they wanted not to blackmail you, they wouldn't be blackmailing you.

The whole point of precommitting not to give in to blackmail, and not to negotiate with terrorists, is the fact that they have the option to do nothing, and if you're not going to give in, they're better off sticking with that option.

So, you precommit not to give in, and this decreases the chance that you'll be threatened in the first place.

The person blackmailing you doesn't have the option of having the FSM grant them $1000

They do have the option of not blackmailing you.

The question is, what's the difference between the two, formally? Neither actually happened, both are counterfactual. (The assumption is that you are already facing a blackmail attempt, trying to decide whether to give in.)

This refers to a significant surprising conclusion in decision theory (at least, UDT-style): which action is correct depends on how you reason about logically impossible situations, so it's important to reason about the logically impossible situations correctly. But it's still not clear where the criteria for correctness of such reasoning should come from.

The question is, what's the difference between the two, formally?

One is a case where a precommitment makes a difference, the other isn't.

Had you convincingly precommitted not to giving in to blackmail* you would not have been blackmailed.

Had you convincingly precommitted to getting the FSM to grant your blackmailer $1000, the FSM still wouldn't exist.

*(which is not an impossible counterfactual+ it's something that could have happened, with only relatively minor changes to the world.)

+[unless you want to define "impossible" such that anything which doesn't happen was impossible, at which point it's not an unpossible counterfactual, and I'm annoyed :p]

which action is correct depends on how you reason about logically impossible situations

A logically impossible situation is one which couldn't happen in any logically consistent world. There are plenty of logically consistent worlds in which the person blackmailing you instead doesn't.

So, it's definitely not logically impossible. You could call it impossible (though, as above, that non-standard usage would irritate me) but it's not logically impossible.

Couldn't you also convincingly precommit to accept the corresponding positive-sum trade?

Yes. But why would you need to? In the positive-sum trade scenario, you're gaining from the trade, so precommitting to accept it is unnecessary.

If you mean that I could precommit to only accept extremely favourable terms; well if I do that, they'll choose someone else to trade with; just as the threatener would choose someone else to threaten

Them choosing to trade with someone else is bad for me. The threatener choosing someone else to threaten is good for me.

/\ That is, in many ways, the most important distinction between the scenarios. I want the threatener to pick someone else. I want the trader to pick me.

One is a case where a precommitment makes a difference, the other isn't.

Obviously, the question is, why, what feature allows you to make that distinction.

Had you convincingly precommitted not to giving in to blackmail* you would not have been blackmailed.

Had you convincingly precommitted to getting the FSM to grant your blackmailer $1000, the FSM still wouldn't exist.

The open question is how to reason about these situations and know to distinguish them in such reasoning.

A logically impossible situation is one which couldn't happen in any logically consistent world.

"Worlds" can't be logically consistent or inconsistent, at least it's not clear what is the referent of the term "inconsistent world", other than "no information".

And again, why would one care about existence of some world where something is possible, if it's not the world one wants to control? If the definition of what you care about is included, the facts that place a situation in contradiction with that definition make the result inconsistent.

Obviously, the question is, why, what feature allows you to make that distinction.

Well, in one case, there are a set of alterations you could make to your past self's mind that would change the events.

In the other, there aren't.

And again, why would one care about existence of some world where something is possible, if it's not the world one wants to control?

Because it allows you to consistently reason about cause and effect efficiently.

Because it allows you to consistently reason about cause and effect efficiently.

If it's not about the effect in the actual world, why is it relevant?

If it's not about the effect in the actual world, why is it relevant?

If I ask "What will happen if I don't attempt to increase my rationality" I'm reasoning about counterfactuals.

Is that not about cause and effect in the real world?

Counterfactuals ARE about the actual world. They're a way of analysing the chains of cause and effect.

If you can't reason about cause and effect (and with your inability to understand why precomitting can't bring the FSM into existence, I get the impression you're having trouble there) you need tools. Counterfactuals are a tool for reasoning about cause and effect.

Only one of possible-to-not-blackmail or his-noodliness-exists is consistent with the evidence, to very high probabilities.

Worlds, in the Tegmark-ey sense of a collection of rules and initial conditions, can quite easily be consistent or inconsistent.

You seem to be beating a confusing retreat here. I bet there's a better tack to take.

The question is, what's the difference between the two, formally? Neither actually happened, both are counterfactual. (The assumption is that you are already facing a blackmail attempt, trying to decide whether to give in.)

The blackmailer has the option of backing down at any point, and letting you go for free. It may be unlikely, but it's not logically impossible.

"Give me $1000 or I'll blow up your car!"

"I have a longstanding history of not negotiating with terrorists. In fact, last month someone slashed my tires because I wouldn't give them $20. Check the police blotter if you don't believe me."

"Oh, alright. I'll just take my bomb and go hassle someone more tractable."

It may be unlikely, but it's not logically impossible.

Assume it is, as part of the problem statement. Only allow agent-consistency (the agent can't prove otherwise) of it being possible for the other player to not blackmail, without allowing actual logical consistency of that event. Also, assume that our agent has actually observed that the other decided to blackmail, and there is no possibility of causal negotiation.

(This helps to remove the wiggle-room in foggy reasoning about decision-making.)

Good point - yet it seems that the costs must ultimately be analysed as opportunity cost. Game theoretically, as reducing their payout to reduce yours. However, if a crazy person who enjoys blowing up cars tells you to give them $10,000 or they'll blow up your car, it's both the case that 1) You're being blackmailed and 2) They would benefit from (prefer to) blow up your car.

Current guess.

Blackmailing is a class of situations similar to Counterfactual Mugging, where you are willing to sacrifice utility in the actual world, in order to control its probability into being lower, so that the counterfactual worlds (that have higher utility) will gain as much probability as possible, and will thus improve the overall expected utility, even as utility of the actual world becomes lower.

Or, simply, you are being blackmailed when you wish this wouldn't be happening, and the correct actions are those that make the reality as improbable as possible.

(In Counterfactual Mugging, you are sacrificing utility in the actual world in order to improve utility of the counterfactual world, while in blackmailing, you are doing the same in order to improve its probability.)

This definition is too broad. It fits the person doing the blackmailing (in a world where you reject my threat, I will act against my local best interest and blow up the bombs), just as well as the person being blackmailed (in a world where you have precommited to bomb me, I will act against my local self-interest and defy you). It fits many type of negotiations over deals and such.

It fits the person doing the blackmailing (in a world where you reject my threat, I will act against my local best interest and blow up the bombs), just as well as the person being blackmailed (in a world where you have precommited to bomb me, I will act against my local self-interest and defy you).

You omit some counterfactuals by framing them as located outside the scope of the game. If you return them, the pattern no longer fits. For example, the blackmailer can decide to not blackmail on both sides of victim's decision to give in, so the utility of counterfactuals outside the situation where blackmailer decided to blackmail and the victim didn't give in is still under blackmailer's control, which it shouldn't be according to the pattern I proposed.

I don't quite see your point. If you take a nuclear blackmailer, then it follows the same pattern: he is committing to a locally negative course (blowing up nukes that will doom them both) so that the probability of that world is diminished, and the probability of the world where his victim gives in goes up. How does this not follow your pattern?

You assume causal screening off, but humans think acausally, with no regard for observational impossibility, which is much more apparent in games. If after you're in the situation of having unsuccessfully blackmailed the other, you can still consider not blackmailing (in particular, if blackmail probably doesn't work), then you get a decision that changes utility of the collection of counterfactuals outside the current observations, which blackmailed (by my definition) are not granted. The blackmailed have to only be able to manipulate probability of counterfactuals, not their utility. (That's my guess as to why our brains label this situation "not getting blackmailed".)

I need examples to get any further in understanding. Can you give a toy model that is certainly blackmail according to your definition, so that I can contrast it with other situations?

Can you give a toy model that is certainly blackmail according to your definition, so that I can contrast it with other situations?

I don't understand. Simple blackmail is certainly blackmail. The problem here seemed to be with games that are bigger than that, why do you ask about simple blackmail, which you certainly already understood from my first description?

Isn't it because you want to incentivize people to bargain with you but incentivize them not to blackmail you?

This doesn't help with answering the question of what "blackmail" means, and how useful it is for decision theory. Or alternatively, expresses the hypothesis that "blackmail" category is a trivial restating of the decision, not a property of the decision problem.

No, it expresses the hypothesis that blackmail occurs when:

a. Someone has harmed you relative to default. b. You now have a choice between something worse for them and something better for them. c. In the short term, you'd prefer what was better than them.

If "a" fails, either they did what was good for you and you should reward them, or they played default and you shouldn't try to punish them, as 1) you can't send a clear signal that you punish people like that and 2) you're likely to get punished yourself.

If "b" fails, not a whole lot you can do about it.

If "c" fails, it's not "failing to respond to blackmail", it's "not being stupid."

The trick with extortion & terrorism is that you give someone a sufficient direct incentive to help you. The incentive being "or else I'll reveal your secrets/blow up your building/punch your baby/whatever." The reason the advice is given is because it's nonobvious whether you should negotiate or stick to a policy of not negotiating.

Sometimes people do give in to blackmail. For example, some countries paid off the Somali pirates, while the US fought them off. It's a strategic choice with a nonobvious answer. This is because "don't give in to blackmail" is not universally applicable advice.

Isn't the default just what would happen if the other person never communicated with you?

But they did communicate with you, as a result of a somewhat deterministic decision process, and not by random choice. How should you reason about this counterfactual? Why doesn't the "false" assumption of their never communicating with you imply that the Moon is made out of cheese?

People engage in this kind of counterfactual reasoning all the time without declaring the moon to be made of cheese; I'm not sure why you're questioning it here. If it makes it any easier, think of it as being about the change in expected value immediately after the communication vs. the expected value immediately before the communication - in other words, whether the communication is a positive or negative surprise.

People engage in this kind of counterfactual reasoning all the time without declaring the moon to be made of cheese

Indeed. How do they manage that? That's one fascinating question.

I think they have an underlying premise that they will believe whatever is necessary to make their lives better, or at least not worse.

Their beliefs about what's better and worse may never be examined, so some of their actions may be obviously sub-optimal. However, they won't fall into thinking that one contradiction means they're obligated to believe every piece of obvious nonsense.

When reasoning about counterfactuals a good principle is never to reach to a more distant* world than necessary.

*(less similar)

If you were to simulate the universe as it was before they contacted you, and make 1 single alteration (tapping their brain so they decide not to contact you) would the simulation's moon be made of green cheese?

That universe is pretty much the closest possible universe to ours where they don't contact you.

Why are merely similar worlds ought to be relevant at all? There could be ways of approximate reasoning about complicated definition of the actual world you care about, but actually normatively caring about the worlds that you know not to be actual (i.e. the one you actually care about) is a contradiction of terms.

You asked how to reason about counterfactuals.

I answered.

Why are merely similar worlds ought to be relevant at all?

I'm not sure what you're asking now. Could you please clarify?

The reason I think about counterfactuals is to understand cause and effect. If you change something, then anything to which it is a cause must also change.

But things (such as whether the moon is made of green cheese) which AREN'T caused by the something, would not change

You asked how to reason about counterfactuals.

I answered.

You answered informally. It's easy, and not what I wondered about.

The reason I think about counterfactuals is to understand cause and effect. If you change something, then anything to which it is a cause must also change.

Things don't change. When you make a decision, you are not changing the future, you are deciding the future. The future is what it is given your actual decision, all else is fantasy (logically inconsistent even, because the structure of your own mind implies only one decision, when you ignore some unimportant cognitive noise), perhaps morally important fantasy whose structure we ought to understand, but still not the reality.

You answered informally. It's easy, and not what I wondered about.

What did you wonder about? You seemed to be wondering why you shouldn't just go "Well, if they'd done that, the moon would have been made of cheese".

If you can't think about counterfactuals such as "What will happen if I do X?" "What will happen if I do Y?" etc., you can't make rational decisions.

You may wish to dismiss counterfactuals as fantasy, but does doing so help you come to good decisions? Or does it hinder you?

The goal is not to dismiss counterfactuals, but to understand where they come from, and how are they relevant for reasoning about the actual world. Distinguish inability to reason informally from lack of formal understanding of the structure of that informal reasoning.

The goal is not to dismiss counterfactuals, but to understand where they come from,

They are a mode of thought. They come from the thinker.

and how are they relevant for reasoning about the actual world.

They allow you to look at cause and effect. Without counterfactuals, you can't reason about cause and effect, you can only reason about correlation.

Taboo cause, effect, taboo counterfactuals. That something is, doesn't answer why it's normatively useful ("they come from the thinker").

Taboo cause, effect, taboo counterfactuals.

Okay: Thinking about how things would differ now, or in future, based on a slightly modified version of the past, allows us to accurately consider what the world could be like, in the future, based on our options in the present.

That something is, doesn't answer why it's normatively useful ("they come from the thinker").

You said you sought to understand where they came from. That they come from the thinker is an answer to that. I answered how they're relevant in the second part of the post (and the first part of this one)

You ignore these points and repeat something contradictory to them, which is wrong in a debate even if you don't accept them. You (or I) need to find another path, and not rehash the same ground.

Okay, I'll go point to point, and try and understand what you meant in that post, that you think I'm ignoring.

Things don't change.

This is simply false, as a statement, so I won't treat it on it's own.

When you make a decision, you are not changing the future, you are deciding the future.

This is fine. Sure. My post works fine within such a structure.

The future is what it is given your actual decision,

True. But making choices requires that one accept that one doesn't know what the future is, nor does one know what one's decision will be. It requires the use of "if... then" thoughts, or counterfactuals.

So, nope, not ignored, just irrelevant.

all else is fantasy

Emotional dismissal, not an actual point.

(logically inconsistent even, because the structure of your own mind implies only one decision, when you ignore some unimportant cognitive noise)

A good counterfactual should be logically consistent. It isn't the real world, but the real world isn't the only logically consistent possible world.

Perhaps you're making the same mistake as you made with the term "logically impossible" earlier?

perhaps morally important fantasy whose structure we ought to understand, but still not the reality.

Dismissal, not an actual point.

EDIT: So, which of those are you claiming I contradicted exactly?

Things don't change.

This is simply false, as a statement, so I won't treat it on it's own.

It has an intended interpretation that isn't false, which I referred in the following statements which you've accepted. (It's more of a summary than a separate point.)

The future is what it is given your actual decision,

True. But making choices requires that one accept that one doesn't know what the future is, nor does one know what one's decision will be. It requires the use of "if... then" thoughts, or counterfactuals.

Yes. If there is a (logical) fact in what your actual decision is, say it's actually A, and you are uncertain about what it'll be, then the assumption A=B is logically false, inconsistent, even if you don't know that it is. When you reason about what happens if A=B, not knowing that it's a false statement, you are reasoning from a false premise, and everything logically follows from a false premise. This is the relevance of this description.

all else is fantasy

Emotional dismissal, not an actual point.

Not emotional. What else is there? There is reality, and then all the thoughts you can have to reason about reality.

A good counterfactual should be logically consistent. It isn't the real world, but the real world isn't the only logically consistent possible world.

If there is a fact of the matter of what your action is, then assuming a possible action that is not actual is logically inconsistent. This is normal. If you are considering something that is not the real world, you need to explain what relation it has to the real world, and how this particular not-real-world is different from all the other not-real-worlds, and what this not-real-world actually is, especially if it's inconsistent, but even if it's consistent, there is still the same question of what privileges it, since it's not the real world, and real world is what you want to reason about.

perhaps morally important fantasy whose structure we ought to understand, but still not the reality.

Dismissal, not an actual point.

It's a point that it's unclear what relation is there between the counterfactuals and reality, given that counterfactuals are usually not the reality.

EDIT: So, which of those are you claiming I contradicted exactly?

You referred to "slightly modified version of the past", and modifying things is starting to consider things other than reality, where it becomes unclear how considering those not-real things helps to understand reality. (Uncertainty is a much better concept than change in this context.) I would further qualify that you can't change your notion of reality without moving away from your original notion of reality, and thus conceptualizing something other than reality, when what you wish to understand is reality, and not this not-reality you've constructed by modifying the concept.

Yes. If there is a fact in what your actual decision is, say it's actually A, and you are uncertain about what it'll be, then the assumption A=B is logically false, inconsistent, even if you don't know that it is.

Where is the inconsistency?

If you assume both that the actual action (A) and the possible, but not actual, action (B) you have an inconsistency.

But if you assume only B; ie. you assume a world as similar to this one as possible where you are such that you will choose action B, then it is perfectly consistent.

hen you reason about what happens if A=B, not knowing that it's a false statement, you are reasoning from a false premise, and everything logically follows from a false premise. This is the relevance of this description.

But, as I have already explained, reasoning about counterfactuals is not, in fact, reasoning in such a way. The statement "If I am in front of you, I am punching you in the face" is true. The statement "If I were in front of you, I would be punching you in the face" is a different, false, statement.

Similarly "If I did not post this comment you are a pizza" is true, but "If I had not posted this comment, you would be a pizza" is false

Until you are willing to grapple with your confusion on this issue, I don't see this conversation being productive.

If you assume both that the actual action (A) and the possible, but not actual, action (B) you have an inconsistency.

Yes.

But if you assume only B; ie. you assume a world as similar to this one as possible where you are such that you will choose action B, then it is perfectly consistent.

You need to show that this other "similar" world has any relevance to reasoning about the actual world. You are not justified in considering a problem statement that stars a non-actual world unless you explain how that helps in reasoning about the actual world.

Of course we know that intuitively that's how it works, but it's not clear why, and "just so" doesn't help in understanding this mystery. Also, it's not clear how to generally construct those counterfactuals, even if we leave the question of their relevance aside. Where is the "you" that ought to be replaced in the environment? What about the thoughts about you in other people's minds, should they be modified as well? If not you run into pitfalls of CDT.

Until you are willing to grapple with your confusion on this issue, I don't see this conversation being productive.

I'm trying to argue that you should be confused, just as I am confused. Notice your own confusion and all.

You need to show that this other "similar" world has any relevance to reasoning about the actual world.

It allows you to think about cause and effect, and it is a necessity in making rational choices. You cannot make a rational choice without thinking through the consequences of different (counterfactual) possible choices.

You are not justified in considering a problem statement that stars a non-actual world unless you explain how that helps in reasoning about the actual world.

In reasoning with counterfactuals, the counterfactual worlds aren't "stars" and the certainly aren't "problem statement"s. They're tools for thinking about the world and making choices.

but it's not clear why,

You keep on making it less clear for yourself, by bringing in things like the principle of explosion, which is irrelevant.

Also, it's not clear how to generally construct those counterfactuals, even if we leave the question of their relevance aside.

I've explained a general method for constructing the counterfactuals already. You assume the smallest possible divergence that could lead to the specific divergence you're interested in.

Where is the "you" that ought to be replaced in the environment?

That's a problem with personal identity, not counterfactuals. Sure, it's an important problem, but adding more confusions to this discussion will not help you to understand counterfactuals.

What about the thoughts about you in other people's minds, should they be modified as well?

To the extent that they were caused by the properties you had at the time.

I'm trying to argue that you should be confused, just as I am confused. Notice your own confusion and all.

But all the reasons you've given for your confusion seem to be trivially irrelevant, or incorrect. The reasons for your confusion seem to be confusions, not good reasons.

Curious. I was arguing motivation for study of TDT/UDT/ADT, without betraying any knowledge about results that are already known. And you've managed to rule out all combinations of confusions these theories are intended to resolve as being irrelevant. The general pattern I see here is that any individual question has an "obvious" intuitive answer, especially if you don't go into detail, and you refuse to either consider multiple questions at once (since they are "unrelated"), or to go deeply enough into each of them individually (since if you assume the intuitive understanding of the other questions, they provide strong enough support for not being confused about this one too).

In other words, you are trapped in the net of intuitive understanding of multiple concepts that help in understanding each other, and are comfortable with this level of understanding, which makes any attempt to look deeper into their nature preposterous to you.

You were arguing from a position whereby you couldn't tell the difference between the statements "If I had precommitted to not give into blackmail I wouldn't have been blackmailed" and "If I had precommitted to call upon the FSM for help, the FSM would exist"

That is a very confused position. I have since explained the difference between those things.

Every confusion you have actually brought to the fore, I have clarified; with the exception of the confusion of "what makes personal identity"; because that wasn't the topic at hand. It's a big and complicated, and seperable, issue. And yes, the personal identity issue leads to some changes in decision theory. But we're not talking about decision theory at the moment.

If you think I haven't clarified one of your confusions, please point it out? Because, honestly, you seem to be just plain ignoring any attempts at clarification.

Because, honestly, you seem to be just plain ignoring any attempts at clarification.

Informal, intuitive attempts at clarifications. Attempts at clarification that don't give deep understanding of what's going on. The standard of understanding I was aiming at, in particular by refusing to accept less formal explanations.

So you say, but you don't point out any difficulty with them.

You simply dismiss them like that.

Your confusions are faux-logical (talking about worlds being logically impossible, when they're not; talking about the principle of explosion, when it doesn't apply); if you want a thorough clarification, give a thorough problem.

all else is fantasy

I am not sure that I am correct. But there seems to be another possibility.

If we assume that the world is a model of some formal theory, then counterfactuals are models of different formal theories, whose models have finite isomorphic subsets (reality accessible to the agent before it makes a decision).

Thus counterfactuals aren't inconsistent as they use different formal theories, and they are important because agent cannot decide the one that applies to the world before it makes a decision.

I'm getting this more clearly figured out. In the language of ambient control, we have: You-program, Mailer-program, World-program, Your utility, Mailer utility

"Mailer" here doesn't mean anything. Anyone could be a mailer.

It is simpler with one mailer but this can be extended to a multiple-mailer situation.

We write your utility as a function of your actions and the mailer's actions based on ambient control. This allows us to consider what would happen if you changed one action and left everything else constant. If you would have a lower utility, we define this to be a "sacrificial action".

A "policy" is a strategy in which one plays a sacrificial action in a certain class of situation.

A "workable policy" is a policy where playing it will induce the mailer to model you as an agent that plays that policy for a significant proportion of the times you play together, either for:

  1. causal reasons - they see you play the policy and deduce you will probably continue to play it, or they see you not play it and deduce that you probably won't

  2. acausal reasons - they accurately model you and predict that you will/won't use the policy.

A "beneficial workable policy" is when this modeling will increase your utility.

Depending on the costs/benefits, a beneficial workable policy could be rational or irrational, determined using normal decision theory. The name people use for it is unrelated - people have given in to and stood up against blackmail, they have given in to and stood up against terrorism, they have helped those who helped them or not helped them.

Not responding to blackmail is a specific kind of policy that is frequently, when dealing with humans, workable. It deals with a conceptual category that humans create without fundamental decision-theoretic relevance.

We write your utility as a function of your actions and the mailer's actions based on ambient control. This allows us to consider what would happen if you changed one action and left everything else constant.

It doesn't (at least not by varying one argument of that function), because of explicit dependence bias (this time I'm certain of it). Your action can acausally control the other agent's action, so if you only resolve uncertainty about the parameter of utility function that corresponds to your action, you are being logically rude by not taking into account possible inferences about the other agent's actions (the same way as CDT is logically rude in only considering the inferences that align with definition of physical causality). Form this, "sacrificial action" is not well-defined.

I think you're mostly right. This suggests that a better policy than 'don't respond to blackmail' is 'don't respond to blackmail if and only if you believe the blackmailer to be someone who is capable of accurately modelling you'.

Unfortunately this only works if you have perfect knowledge of blackmailers and cannot be fooled by one who pretends to be less intelligent than they actually are.

This also suggests a possible meta-strategy for blackmailers, namely "don't allow considerations of whether someone will pay to affect your decision of whether to blackmail them", since if blackmailers were known to do this then "don't pay blackmailers" would no longer work.

I would also suggest that while blackmail works with some agents and not others, it isn't human-specific. For example, poison arrow frogs seem like a good example of evolution using a similar strategy, having an adaptation that is in no way directly beneficial (and presumably is at least a little costly) that exists purely to minimize the utility of animals which do not do what it wants.

Unfortunately this only works if you have perfect knowledge of blackmailers and cannot be fooled by one who pretends to be less intelligent than they actually are.

Not perfect knowledge, just some knowledge together with awareness that you can't reason from it in certain otherwise applicable heuristic ways because of the incentives to deceive.

Yes, that's what I meant. I have a bad habit of saying 'perfect knowledge' where I mean 'enough knowledge'.

Can I take it that since you criticized a criticism of this hypothesis without offering a criticism of your own, that you believe that this hypothesis is correct?

What hypothesis?

My comment was entirely local, targeting a popular argument that demands perfect knowledge where any knowledge would suffice, similarly to the rhetoric device of demanding absolute certainty where you were already presented with plenty of evidence.

It's evidence that you have seen the comment that he's replying to, in which I lay out my hypothesis for the answer to your original question. (You've provided an answer which seems incomplete.)

Reducing the problem to a smaller problem, or another, already-existing problem, in a way that seems nonobvious to fellow lesswrongers (and therefore possibly wrong) is useful.

For example, my way resolves, or mostly resolves the blackmail/bargain distinction. Blackmail is when the pre-made choice is bad for you relative to the most reasonable other option, bargain is when it's good for you.

Maybe I can explain what's going on game-theoretically when I say "default" in this context.

You're trying to establish a Nash equilibrium of, for actions in that category X:

You don't do X// I punish you for doing X

Now the Schelling situation is that you may not be able to reach this equilibrium, if X is a strange and bizarre category, for instance, or if we'd prefer to prevent you from punishing us by locking you up instead.

So it may be that there is no one general category here. I could give in to terrorism but not blackmail, for instance. It's about clusters in harmful-action-space.

It seems to me the relevant difference is that in blackmail one or both parties end up worse off. So a group of individuals who blackmail each other tend to get poorer over time, compared to a group that successfully deters blackmail.

Agent 1 negotiates with agent 2. Agent 1 can take option A or B, while agent 2 can take option C or D. Agent 1 communicates that they will take option A if agent 2 takes option C and will take option B if agent 2 takes option D.

If utilities are such that for

  • agent 1: A > B, C < D, A+C < B + D

and for

  • agent 2: A < B, C > D, A+C < B + D

or

  • agent 1: A < B, C > D, A+C > B + D
  • agent 2: A > B, C < D, A+C > B + D

this is an offer.

If

  • agent 1: A < B, C < D, A+C < B + D
  • agent 2: A < B, C > D, A+C < B + D

or

  • agent 1: A > B, C > D, A+C > B + D
  • agent 2: A > B, C < D, A+C > B + D

this is blackmail by agent 1.

If

  • agent 1: A > B, C < D, A+C < B + D
  • agent 2: A < B, C < D, A+C < B + D

or

  • agent 1: A < B, C > D, A+C > B + D
  • agent 2: A > B, C > D, A+C > B + D

this is agent 1 giving in to agent 2's blackmail.

I don't think I mentioned anything about any "default" anywhere?

(Unless I overlooked something in the other cases there is either no reason to negotiate, no prospect of success in negotiating or at least one party acting irrationally. It is implicitly assumed that preferences between combinations of the options only depend on the preferences between the individual options. )

Notice that under this definition punishing someone for a crime is a form of blackmail.

I'm not sure that's a problem.

Or maybe: Change blackmail in the above to threat, and define blackmail as a threat not legitimized by social conventions.

Well, at least we've unpacked the concept of "default" into the concept of social conventions.

Or into a concept of ethics. Blackmail involves a threat of unethical punishment.

I think we can do better than that. In cases where the law is morally justified, punishing someone for a crime is retaliation. I think part of the intent of the concept of blackmail is that the threatened harm be unprovoked.

Agent 1 communicates that they will take option A if agent 2 takes option C and will take option B if agent 2 takes option D.

Correction: Retracted, likely wrong.

Explicit dependence bias detected. How agent 1 will decide generally depends on how agent 2 will decide (not just on the actual action, but on the algorithm, that is on how the action is defined, not just on what is being defined). In multi-agent games, this can't be sidestepped. And restatement of the problem can't sever ambient dependencies.

I don't see how that's relevant. "I will release the child iff you give me the money, otherwise kill them" still looks like blackmail in a way "I will give you the money iff you give me the car, otherwise go shopping somewhere else" does not, even once the agents decided for whatever reason to make their dependencies explicit.

Bias denied.

First I make no claims about the outcome of the negotiation so there is no way privileging any dependence over any other could bias my estimation thereof.

Second, I didn't make any claim about any actual dependence, merely about communication, and it would certainly be in the interest of a would-be blackmailer to frame the dependence in the most inescapable way they can.

Third, agent 2 would need to be able to model communicated dependencies sensibly no matter whether it has a concept of blackmail or not, but while how it models the dependence internally would have a bearing on whether the blackmail would be successful that's a separate problem and should have no influence on whether the agent can recognize the relative utilities.

I wasn't thinking clearly; I don't understand this as an instance of explicit dependence bias now, though it could be. I'll be working on this question, but no deadlines.

Suppose that Blackmail is

merely an affective category, a class of situations activating a certain psychological adaptation

-- then we should ask what features of the ancestral environment caused us to evolve it. We might understand it better in that case.

I suspect that the ancestral environment came with a very strong notion of a default outcome for a given human, in the absence of there being any particular negotiation, and also came with a clear notion of negative interaction (stabbing, hitting, kicking) versus positive interaction (giving fish, teaching how to hunt better, etc).

Uh, spending effort on hurting people is negative-sum and most likely lose-lose, while teaching someone to hunt is positive-sum lose-win. Or maybe you see some deeper mystery here that I'm not seeing?

The problem with "lose-lose" is that it relies upon there being a "defualt outcome given no interaction". Vladimir is trying to taboo this concept, at least in general. So I am going to focus on a relevant special case, namely specific interactions available in the ancestral environment.

Uh, spending effort on hurting people is negative-sum and most likely lose-lose

What ancestral environment are you thinking of?

My take: what we call "extortion" or "blackmail" is where agent A1 offers A2 a choice between X and Y, both of which are harmful to A2, and where A1 has selected X to be less harmful to A2 than Y with the intention of causing A2 to choose X.

"Not responding to blackmail" comprises A2 choosing Y over X whenever A2 suspects this is going on.

A1 can still get A2 to choose X over Y, even if A2 has a policy of not responding to blackmail, by not appearing to have selected X... that is, by not appearing to be blackmailing A2.

For example, if instead of "I will hurt you if you don't give me money" A1 says "I've just discovered that A3 is planning to hurt you! I can prevent it by taking certain steps on your behalf, but those steps are expensive, and I have other commitments for my money that are more important to me than averting your pain. But if you give me the money, I can take those steps, and you won't get hurt," A2 may not recognize this as blackmail, in which case A1 can finesse A2's policy.

Of course, any reasonably sophisticated human will recognize that as likely blackmail, so a kind of social arms race ensues. Real-world blackmail attempts can be very subtle. (ETA: That extortion is illegal also contributes to this, of course... subtle extortion attempts can reduce A1's legal liability, even when they don't actually fool anyone.)

(Indeed, in some cases A1 can fool themselves, which brings into question whether it's still blackmail. IMHO, the best way to think about cases like that is to stop treating people fooling themselves as unified agents, but that's way off-topic.)

Why is the "default" special here?

Because in a blackmail, I do not wish the trade to happen at all. Let the "default" outcome for a trade T be one where the trade doesn't happen. Assume that my partner (the Baron) gets to decide whether T happens or not.

If T is a blackmail, then every option is worse than not-T. So, if I can commit to ensuring that T is also negative for the Baron, then the Baron won't let T happen. This gives a definition for blackmail: a trade T where every option is worse than not-T, but where I can commit to actions that ensure that T is negative for the person that decides whether T happens or not.

Let's contrast this with another trade T, with no blackmail elements to it, where I am a monopolist or monopsonist. It is still to my advantage to credibly commit to rejecting everything if I don't get 99% of the profit. However, I am limited by the fact that I want the trade to happen; I can't commit to any option that is actually harmful to the Baron. He will trade with me as long as he doesn't lose; his 'default' ensures that I have to give him something.

Finally, most trades are not monopolist or monopsonist. In this case, it is not to my advantage to precommit to taking more than "my fair (market) share" of the profit, as that will cause the trade to fail; the Baron's default is higher (he can trade with others) so I have to offer him at least that.

Now, I don't want to go down the rabbit hole of dueling pre-commitments, or the proper decision-theoretic way of resolving the issue (blackmailing someone or precommiting to avoid blackmail are very similar processes). But it does show why you would want to precommit to a particular action in blackmail situations, but not in others: you do not control if the trade happens, and blackmails are trades that you do not want to see happen. You can call not-trading the 'default' if you wish, but the salient fact is that it is better for you, not that it is default.

Because in a blackmail, I do not wish the trade to happen at all.

Something has to happen, and you must choose from the options you're dealt. Maybe I don't wish to pay for my Internet connection, and would rather have the Flying Spaghetti Monster provide it to me free of charge, and also grant me $1000 as a bonus? This seems to qualify as not wishing I had to choose a provider at all. But in reality, I have to choose, and FSM is not available as an option, just as not being blackmailed is not available as an option (by assumption; the agent doesn't need to know that, only the problem statement that logically implies that).

The difference is that since blackmail is costly, there is no incentive to blackmail someone who will not give into it, which makes people who won't give in better off than people who will. On the other hand, there is no incentive for a company to offer free services to someone who refuses to 'give in' and pay money.

I think the logic is along the lines of "make the decision which, if the other party knew you were going to make it, would maximise your expected utility".

Which shows exactly why the rule is not universally applicable - the other party does not, in general, know what decision you're going to make (though they can predict it to some level of accuracy), and so there's a cost/benefit situation.

I am going to try and save my attempted solution ( http://lesswrong.com/lw/39a/unpacking_the_concept_of_blackmail/342c?c=1 ) from being stuck at the bottom of the thread. This might be inappropriate behavior, and if so please inform me.

I think you've answered your own question in your comment.

0 points