This article uses a thought experiment to probe at dangers of public online communication, then describes an ontology around “malicious supporters” and possible remedies.

It's arguably a more obvious fit for LW than the EA Forum, but I posted it there for reasons explained in the comments. I think people here might also find it interesting though.

New Comment
10 comments, sorted by Click to highlight new comments since: Today at 10:08 PM

If you are a consequentialist, you probably should avoid discussing porridge breakfasts. [...] Things could get worse if this person had an agenda. They realize they have power over you.

I don't negotiate with terrorists! Whether or not this person consciously thinks of themselves as having an agenda, if their behavior is conditioned on mine in a way that's optimized for controlling my behavior—the "I just wanted to let you know" message definitely counts—then I must regard them as an extortionist, who is only threatening harm because they expect the threat to succeed. It would be awfully short-sighted of me to let them get away with that—for the end of that game is oppression and shame, and the thinker that pays it is lost!

The function of speech is to convey information—to build shared maps that reflect the territory. The reason speech is such a powerful cognitive technology is because accurate beliefs are a convergent instrumental value—whatever you're trying to do, you'll probably do a better job if you can make accurate predictions. When I contribute accurate information to the commons, I don't know all the various downstream consequences of other agents incorporating that information into their maps—I don't see how I'm supposed to compute that. Even if "You should always choose the action with the best consequences" would be the correct axiology for some superintelligent singleton God–Empress who oversees the whole universe and all the consequences in it, I'm not a God–Empress, and "Just tell the goddamned truth" (with the expectation that this is good on net, because true maps are generically useful to other agents, almost none of whom are evil) seems like a much more tractable goal for me to aim at.

Things arguably get more complicated when the aggressor thinks of themself as being on your side.

What does that even mean? I read lots of authors, including a lot of people who I would personally dislike, because I benefit from reading the information that they have to convey. But they don't own me, and I don't own them. Obviously. What's this "side" thing about? Am I to be construed as being on the "side" of the so-called "rationalist" or "effective altruism" "communities" just because Eliezer Yudkowsky rewrote my personality over the internet twelve years ago? God, I hope not!

Option 4 [Don't Censor] [...] seem fairly common though deeply unfortunate. It’s generally not very pleasant to be in a world where those who are listened to routinely select options 4 [...]

It's not very pleasant to live in a world with terrorists trying to control what people think! And any sane blame-allocation algorithm puts the blame on the terrorists, not the people who are trying to think!

This person writes a tweet about food issues, and then a little while later some food critic gets a threat. We can consider this act a sort of provocation of malicious supporters, even if it were unintentional. [...] But we can speculate on what they might have been thinking when they did this.

I agree with Dagon that loudly condemning the malicious actors is the right play, but I'll accept that it's not enough to prevent harm in the least convenient possible world.

In that world, my actual response is that it's not my fault. It's bad for food critics to get threats! I unequivocally condemn people who do that! If there's some sort of causal relationship between me telling the truth about food, and food critics getting threats, that runs through other agents who are not me who won't stop their crimes even if I condemn them ... well, that's a really terrible situation, but I'm not going to stop telling the truth about food. I don't negotiate with terrorists!

I don't see how the usual rationale for not negotiating with terrorists applies to the food critics case. It's not like your readers are threatening food critics as a punishment to you, with the intent to get you to stop writing. Becoming the kind of agent that stops writing in response to such behavior doesn't create any additional incentives for others to become the kind of agent that is provoked by your writing.

Similarly, it seems to me "don't negotiate with terrorists" doesn't apply in cases where your opponent is harming you, but 1) is non-strategic and 2) was not modified to become non-strategic by an agent with the aim of causing you to give in to them because they're non-strategic. (In cases where you can tell the difference and others know you can tell the difference.)

Thanks (strong-upvoted), this is a really important objection! If I were to rewrite the grandparent more carefully, I would leave off the second invocation of the "I don't negotiate ..." slogan at the end. I think I do want to go as far as counting evolutionary (including cultural-evolutionary) forces under your "modified to become non-strategic by an agent with the aim [...]" clause—but, sure, okay, if I yell in a canyon and the noise causes a landslide, we don't want to say I was right to yell because keeping silent would be giving to to rock terrorism.

Importantly, however, in the case of the harrassed food critic, I stand by the "not my fault" response, whereas the landslide would be "my fault". This idea of "fault" doesn't apply to the God–Empress or other perfectly spherical generic consequentialists on a frictionless plane in a vacuum; it's a weird thing that we can only make sense of in scenarios where multiple agents are occupying something like the same "moral reference frame". (Real-world events have multiple causes; consequentialist agents do counterfactual queries on their models of the world in order to decide what action to output, but "who is 'to blame' for this event I assign negative utility" is never a question they need to answer.)

But I think blame-allocation is a really important feaure of what's actually going on when crazy monkeys like us have these discussions that purport to be about decision theory, but are really about monkey stuff. (It's not that I started out trying to minimize existential risk and happened to compute that going on a Free Speech for Shared Maps crusade was the optimal action; as you know, what actually happened was ... well, perhaps more on this in a forthcoming post, "Motivation and Political Context of my Philosophy of Language Agenda".) I have to admit it's plausible that a superintelligent singleton God-Empress programmed with the ideal humane utility function would advise me to self-censor for the greater good. And coming from Her, I wouldn't hesitate to take that advice (because She would know). But that's not the situation I'm actually in! In the "Provoking Malicious Supporters" section of the post, Gooen writes, "This closely mirrors legal discussions of negligence, gross neglect, and malice", but negligence and neglect are blame-allocation concepts, not single-agent decision theory concepts!

In accordance with the theory of universal algorithmic bad faith, we might speculate that some part of my monkey-brain is modeling "posts that imply speakers should be blamed for negative side-effects of their speech" as enemy propaganda from the Blight dressed up in the literary genre of consequentialism, for which my monkey-brain has cached counter-propaganda. The only reason this picture doesn't spell complete doom for the project of advancing the art of human rationality, is that the genre constraints are actually pretty hard to satisfy and have been set up in a way that extracts real philosophical work out of monkey-brains that have Something to Protect, much as a well-functioning court is set up in a way that extracts Justice, even if (say) the defendant is only trying to save her own neck.

Useful exploration for any somewhat-intellectual community which prefers not to cause (or be blamed for) malicious behavior.

I think you're missing an option, though.  You can specifically disavow and oppose the malicious actions/actors, and point out that they are not part of your cause, and are actively hurting it.  No censorship, just clarity that this hurts you and the cause.  Depending on your knowledge of the perpetrators and the crimes, backing this up by turning them or actively thwarting them may be in scope as well.

Don't censor yourself at all - that's a (possibly unintentional, but that doesn't matter) blackmail response that does more harm than good.  

I think you're missing an option, though. You can specifically disavow and oppose the malicious actions/actors, and point out that they are not part of your cause, and are actively hurting it. No censorship, just clarity that this hurts you and the cause. Depending on your knowledge of the perpetrators and the crimes, backing this up by turning them or actively thwarting them may be in scope as well.

There is a practical issue with this solution in the era of modern social media. Suppose you have malicious actors who go on to act in your name, but you never would have associated yourself with them under normal circumstances because they don't represent your values. If you tell them to stand down or condemn them, then you've associated yourself with them, and that condemnation can be used against you.

To be clear, "stand down" is not condemning.  "F them and their destructive actions" is condemning.  In more formal settings, "I do not support X, and I do not want anything to do with people doing X".

A few examples of clear condemnation being used against someone, where that retaliation is worse than the implied association of doing nothing, would help me understand your comment.

Note that If they're not ALREADY associated with you in some way (through their actions and publicity, referencing your reputation without your consent), you don't need to respond in any way.  That's a pretty easy option 4, I think.  

Yeah, this also seemed to me like the primary alternative missed in that section.

Thanks for the feedback!

That sounds similar to what I called "Option 3"; "You gradually change or improve the community while doing minor self censorship.".

I think that doing this is highly challenging and very far from being trivial. Online you can often barely tell who your followers are for one. I think that one should try to do things like you mention, but don't think it's enough in most settings, for people with sizeable (thousands of people + ) audiences.

I think I was mislead by the words "gradually" and "community" in option 3.  I think that direct opposition to the bad actions as distinct option.  It does improve the community (by removing the bad), and I guess it's gradual because there's always more, but it didn't feel the same to me.

I don't claim it's trivial, but it's not impossible - you know about the problem, because it's the problem you're reacting to!  "the criminals who do X are not part of our community - everyone please shun them" is a minimum, and in some cases you can follow up with actual specifics.

Note - I don't have any significant public presence, so I may be severely underestimating the complexity.  I still think the discovery problem is inversely correlated with the severity of the problem itself.  Note also that I'm not paying much attention to EA forum, so if there is a specific problem that this is generalizing from (which is implied but not explained in the EA comments), it may be different from the examples available to me.

Another option not discussed is to control who your message reaches in the first place, and in what medium. I'll claim, without proof or citation, that social media sites like twitter are cesspits that are effectively engineered to prevent constructive conversation and to exploit emotions to keep people on the website. Given that, a choice that can mitigate these kind of situations is to not engage with these social media platforms in the first place. Post your messages on a blog under your own control or a social media platform that isn't designed to hijack your reward circuitry.

New to LessWrong?