There should be a discussion about LW's policy to allow calls for violence

Mikhail Samin

This post does not represent the best arguments that different sides might produce, and I don't claim to pass anyone's ITT here; I write this to start a discussion I think is important for LW to have.

America’s First Amendment protections often give people in the US a right to call for violence, except specific calls likely to produce imminent action. Social media platforms converged on banning specific calls for violence. The community around LessWrong values honesty and open conversation; it also represents a community of people focused on AI existential threat, and what’s going on here reflects on the perception of the broader AI x-risk community and on the Overton window of actions available to the sympathizers.

At the moment, LessWrong’s policy is to allow calls for violence, including specific^[1]. (EDIT: see more discussion and clarifications of LessWrong's current policy in Oliver's reply here.)

The head of LessWrong moderation Oliver Habryka says that allowing discussion of violence leads to better common knowledge that people think violence is a bad idea, than instead deleting any discussion of it. (Disclosure on potential conflict of interests: Oliver and I had conflicts, including my Twitter post about the topic of this post resulting in Oliver banning me from everything he can, except LW.) He also said there are clearly some circumstances in which violence is permitted, and people will know that, and if discussion of violence isn’t permitted, people will rationalize that their situation is one of those circumstances.

I think it's a false dichotomy to either allow all discussion of violence, including specific calls for killing specific people in a coordinated manner, or to not ever permit any discussion even of the kinds of situations where violence can be justified, at any degree of specificity.

These two extremes are not the only options. Many platforms strike some balance and have some rules. Discussion of whether you’re allowed to hit someone who is attacking you with a gun is usually allowed. Conspiracy to assassinate the president is usually not permitted. For some corner cases, moderators use their judgment.

LessWrong is more libertarian than many platforms; however, even X, Telegram, and Substack, all with quite libertarian free speech absolutist branding, don’t permit calls for violence. I expect LessWrong to want to have rules that permit policy discussions of when it’s okay for people to resort to violence that Substack and X allow (e.g., a post about when people must violently revolt to sustain democratic institutions); but I expect that on reflection, LessWrong would not want to permit specific calls for violence, or discussion of whether violence is okay when a reader can find a way to contact a participant of the discussion and collaborate with them on committing violence. The cost of some guy regularly talking about violence on LW, and then going out and doing something, is pretty bad.

The following are the arguments I thought of and potential remedies for the downside risks. (They might not represent anyone's opinion.)

Potential reasons and ways of allowing more discussion of violence on LW

Here's why LW might allow more than zero discussion of violence, how it might do it to avoid some of the downside, and why I think some of those don't work or can be improved:

Dissuading people

Some of the people who think violence can be helpful could be persuaded otherwise.

If you can post that you think it’s a good idea to kill someone because that can prevent the doom, then someone can reply that it won’t prevent doom for specific reasons, that normally when we think violating deontology is good for some well thought through reasons, our brains are lying to us, etc.

I can see how talking about specifics can allow others to come up with very specific negative consequences of violence that might be more persuasive for many people than general or higher-level arguments. But I don’t think allowing specific calls for violence is really necessary for that; plausibly, it’s sufficient to let people have discussions of specific hypotheticals (“why would it be bad if someone…”) without permitting calls like “let’s kill that and that person”, or perhaps even let people only have policy-level/high-level discussions.

I don't think it's easy to convince the guy who made this comment to halt; he is psychopathic, self-describes as "have always been violent", and found a justification for attempting violence in AI x-risk. But perhaps some people can be marginally convinced not to go through with violent and ill-advised plans.

Common knowledge about strong unacceptability of violence

If everyone knows that one side of a discussion is banned, it might be unclear to people if there’s a real consensus that violence is bad, or only apparent consensus because one of the sides cannot say anything.

I think there’s some merit to this: it’s good to be able to transparently show that the community actually thinks that violence is bad, and isn’t just saying it because of constraints placed by the community organizers and their beliefs (or potentially the beliefs they pretend to have).

However, the absence of pro-violence content isn't strong evidence of community consensus, because the legal and reputational costs of supporting violence in public would produce that absence regardless of underlying views. People might not quite be able to publicly support violence due to it being illegal, or upvote calls for violence out of fear that the upvotes would be reported, or want to post in supprot of violence because someone might support violence while not wanting the community to be known for supporting violence for PR reasons, and so, even on a supposedly unregulated and uncensored platform, one would expect to see all of the senior community members not expressing support for violence, regardless of whether the community and its senior members universally oppose violence or not.

There's also the automod mechanism: depending on your karma and the karma of your recent contributions to the website, you might be rate-limited and unable to write more posts or comments than some number per hour/day/week. That and the common knowledge of the unpopularity of violence on LessWrong mean a reader can't distinguish a world where almost no one supports violence from a world where a non-trivial minority does, but is silent, rate-limited, and outvoted.

So it would not be quite fully believable, to someone considering committing violence, that the community strongly opposes violence, even if LW is supposed to not censor support for violence; self-censorship would still happen and prevent common knowledge of the strong unacceptability of violence.

(Tangentially, it might be good to think of mechanisms to show that the community in fact strongly opposes violence despite these issues. E.g., strongly anonymous surveys for users above some karma threshold? Displaying the number of or karma from upvotes or downvotes on hover instead of just the absolute number of votes?)

Using LW as a honeypot and reporting people who want to commit violence to the FBI

If a part of why calls for violence are allowed on LW is that they will be reported to the law enforcement, hopefully preventing successful realization of the violence in question, Habryka’s comment that contains “If someone is thinking about doing something crazy, they should post on LessWrong and hear people’s counter-arguments and disagree-votes” doesn’t quite pass the onion test.

I would, however, agree with the policy: it is good to report people who might conspire to kill others to police. (A friend reported the guy who wrote the comment above to the FBI.) I don’t even find it to be bad to mislead such people (as long as you’re meta-honest about it); if someone wants to commit a violent crime, it’s better to stop them if such possibility arises (e.g., if they're not staying anonymous), even when this means reporting their public comments on your own website that you previously welcomed them to. When dealing with such people, it’s fine to wear a hat of a website moderator, and then separately a hat of someone who looks at the website, sees a call for violence, and reports it.

This could, in principle, make it harder for the stupidest of criminals to succeed at their misguided objectives; e.g., the guy reported to the FBI posted under (what appears to be) his real name.

Still, many people would be able to share their contacts while staying anonymous. This would mean that we’re getting all of the downsides of people being able to get in touch with each other and coordinate and present various threats without the upside of being able to report and stop them via being the platform where this happens.

So for most relevant potential criminals, honeypotting would not work.

Also, not being able to honestly tell people they won’t be proactively reported means that people will be careful in what they’re saying, somewhat defeating the purpose of allowing discussions of specifics to allow others to convince the person otherwise, except in not-so-smart people who are less likely to succeed.

Perhaps, a much better effort is to spin up a bunch of honeypots unrelated to LW, ideally in coordination with law enforcement, so that people looking for committing violence due to AI would be able to find a community and be arrested before they actually commit a crime.

(The potential of using AI for honeypotting criminals is quite large. Would be cool if anyone who wants to buy an illegal firearm finds a legit-looking but an AI-run honeypot and cannot actually obtain the means for committing crimes. Someone could run a network of darknet websites reviewing each other etc. with none of the services sold by any of them being real, and with everything being reported.)

Claude comments: “If LW says publicly “we report violence posts to law enforcement,” the honeypot is broken (no one posts). If LW says publicly “we don’t,” it has explicitly accepted a coordination venue. Habryka’s “post here and hear counter-arguments” framing implicitly commits to the latter. Either he should commit to the former (and then drop the “deradicalization through discussion” framing, since people won’t post), or accept that the policy actively facilitates coordination.”

Additionally, if people still post on LW calls for violence and then violence is committed downstream of that, "we've been honeypotting people and reporting them" would be a pretty weak defence (and validly so).

Allowing discussion of planned crimes, while being transparent that it might be sent to law enforcement agencies

It might make sense to be a platform transparent that it'll inform law enforcement agencies of plans of this type, because some people will still want to loudly telegraph their intent to commit stupid violent crimes, and even people aren't dissuaded by other commenters, law enforcement might prevent some of those crimes because of the discussion.

Requiring anonymity; disallowing contact information for posts about violence

An opposite approach is to require that if you want to post about violence, you need to sign up for a special kind of account, and have your posts and comments and edits to them pre-moderated, making sure that you do not leave contact information anywhere.

In case LW wants to have additional rules (e.g., only policy discussions are allowed: is it okay to do such and such thing in such and such situation, to allow others to change your mind; no specific plans or specific calls for violence are allowed), those can also be enforced.

This reduces the problem of the website facilitating coordination between potential criminals.

If not allowed on LW, criminals go dark

If LW bans discussion of violence, people might find other platforms to talk, where they might reinforce each other’s radicalization, not experience the pushback from the majority of the community, and be less visible to law enforcement.

(It's not clear how many such people there are and how easy it would be for them to find each other in the absence of LW.)

Some reasons against allowing various kinds of discussion of violence

I once read that I should not write an argument that the reader can straightforwardly generate, so I'm not expanding on some of the following. If anything here is unclear, let me know, and I’ll expand.

Dissuading people might work less because of LessWrong's AutoMod

Even if you grant the rest of LW policies' premises, persuasion normally requires sustained back-and-forth and doesn’t just work via replies to the first post that doesn’t go into the details of the reasons for beliefs and crises that can be argued with. But due to LessWrong’s automod, people who try to argue for unpopular opinions are not able to post more details of their arguments. This means that while people would be able to post in support of violence once, they won’t be able to go into a detailed/prolonged discussion. This somewhat defeats the justification.

(I think disabling automod for average discussions of violence is ill-advised. I can imagine a solution of separate threads that are not shown to anyone by default/are almost shadowbanned except it’s an explicit mechanism, where automod is off to allow people to continuously have downvoted discussions with anyone who wants to participate.)

Reference classes

I gave Claude a draft of this post and asked it to research reference classes. I think its analysis is fairly sycophantic and/or trying to write for the bottom line of woke values me and Claude share, so perhaps ask your Grok instead. Claude says that “the direction of the evidence is one-sided against the LW policy on specific calls for violence, but not against the broader category of philosophical discussion of when violence might be justified.”

Some points it mentioned:

Counter-narrative systematic reviews show effects on attitudes, not on violence — and sometimes backfire on the highest-risk subset
Where attacks are prevented, prevention is achieved by law-enforcement action triggered by leakage, not by community counter-argument changing the attacker’s mind. The documented deradicalization successes (Life After Hate, EXIT-Germany, ISD’s “Counter Conversations”) are uniformly private, peer-mentored, long-term interventions by trained formers — not public forum debate.
(It talked about the forum-to-attack pipeline, but that’s ridiculously irrelevant, given that all of the examples it gave are forums where a majority would I think be pretty much in support of violence.)

See all of it: https://claude.ai/public/artifacts/4684e5c5-a3db-4523-8e63-e178cafc06ae.

Post with calls that might cause actual violence

A simple test could be “Could a sympathetic reader use this post as a starting point for action?”.

I think discussions of when it is okay to commit violence are fine (e.g.,a discussion of “if someone is breaking into your house, is it okay to stop them with force” will not cause a reader to find someone and kill them).

I think most of why allowing discussions of violence could be good still works even if discussions that don’t pass this test are not allowed.

Would be good to avoid causing actual violence.

Garden

(Some of the core LW users might dislike the website a bit more due to the presence of calls for violence, and lead to the well-kept gardens die by pacifism dynamic.)

Overton window

(Shifts to the Overton window of permissible actions due to the discussions being allowed and taken seriously, even if most people disagree with one of the sides.)

Facilitating coordination

Some of the potential targets of threat actors have reasonably good security, and it might be hard for lone actors to cause harm. LessWrong is a Schelling point for AI x-risk discussions. It’s plausible that LessWrong allowing such discussions would marginally cause more threat actors to find each other and coordinate, with all of the potential terrible consequences.

Strong norms of non-violence without exceptions

Movements with strong norms of non-violence are more successful, including because people are a lot more sympathetic towards these kinds of movements.

PR against well-resourced opponents who want to see violence that can be attributed to/as originating from our community

The marginal cost of allowing discussions of violence is some chance of one successful attack that (a) kills someone and (b) tags the entire AI x-risk community as the source of stochastic terrorism in many future articles about AI policy. The coverage of the guy who threw Molotov at an Altman's house already mentions PauseAI; AI x-risk is mentioned in basically every story. If someone regularly talks about violence on LW and then goes out and does something, this would be terrible PR-wise in a way that’s hard to overstate.

Research shows that if moderate organizations don’t distance themselves from radical flanks, they bear reputational cost; radical-flank existence correlates with decreased mobilization and higher state repression, especially when it involves violence. (Chamberlain 2025, Ellefsen 2018, ask your LLM for animal-rights and other cases.)

Influencing the norms of nearby communities

It is vital for movements to be strictly non-violent. It might be harder for PauseAI and others to have members adhere to that if there’s a non-marginalized platform open to them for discussions of violence, including specifics and not just intellectual inquiry.

Laws, European anti-terrorism laws

According to Claude, the UK Online Safety Act makes “inciting violence” a “priority illegal content” category that in-scope platforms must proactively identify, remove, and design against; “Senior executives can face criminal liability if they are found responsible for breaches of the regulation”. The EU Digital Services Act has parallel provisions.

These are not, in my opinion, unjust laws. As a civilization, we would prefer a world in which no community considers committing violence that’s broadly conceived of as illegal. If a community thinks of it as an exception, it is normally wrong; and we would prefer to live in a world with a strong coordinated-on norm of not facilitating coordination of those who might commit violence, even when they think it’s a good idea to.

Conclusion

My — not necessarily unbiased^[2] — opinion is that the reasonable default should be to not allow specific calls for violent actions.

I sketched some ideas for potential marginal improvements (mostly in parentheses): allowing policy discussion but not specific calls, requiring anonymity to make it harder for people to get in contact with each other, pre-moderating comments marked as calls for violence to exclude ones with contact information, creating separate threads for dissuading people where you don't run into automod even with negative karma, possibly displaying the numbers of or karma from upvotes and downvotes, or running anonymous surveys.

Ideally, LW's policies do not facilitate violence while preventing criminals from going dark and losing visibility to law enforcement.

There's going to be an increasing number of misguided people willing to do crime, and LessWrong is a place they will easily find. It might be good for the community and the team running the website to think through what the good policies here would be.

^{^}

(I agree with @jimrandomh here.)
^{^}
Growing up, I was pretty convinced by Gene Sharp's ideas in a context that doesn't necessarily apply here.
^{^}
Depending on your karma and the karma of your recent contributions to the website, you might be rate-limited and unable to write more posts/comments than some small number per hour/day/week.

I think it's a false dichotomy to either allow all discussion of violence, including specific calls for killing specific people in a coordinated manner, or to not ever permit any discussion even of the kinds of situations where violence can be justified, at any degree of specificity.

This is true! And indeed, no such dichotomy has been proposed by me, so I think you must have misunderstood some of my comments here.

There are clearly some things that would be over the line. If someone posts a comment being like "I will show up to <company office X> tomorrow and firebomb them, show up if you read this and want to participate", I would very likely take it down (and also report it to the police and share what info I have on who wrote it).

My two moderation comments on the issue do not read to me as implying this kind of dichotomy. I am likely to delete many kinds of comments that directly try to coordinate violence on LessWrong (and for the case of the guy whose comments you screenshotted, we e.g. deactivated his ability to send DMs on LessWrong), especially in as much as community consensus was strongly against that violence being a good idea (as they would be for the vast majority of such things).^[1]

I don't have a great model about what the exactly correct lines here are, and am interested in what other people think good lines might be.

Tentatively, I would say that...

People should be able to discuss whether or not to commit violence against large groups of people, like asking whether now is the time to have a revolution against the government, and it's not possible to fully discuss that without allowing for advocacy.^[2]
Advocating for violence against highly specific targets in specific ways gets a lot dicier due to threat dynamics, and my guess is I would at the very least ask people to first make an argument for a broader reference class of plans that reads as less directly threatening.
Using LessWrong to actually coordinate or execute plans^[3], would only be fine if somehow the people doing that had gotten community buy-in on that being a good idea (which e.g. doesn't seem implausible for something like a Russian resistance movement, or providing assistance to the Ukrainian war, but seems quite unlikely for any extrajudicial violence in any western country without very drastic changes in the political climate).^[4]

But I mostly expect to decide on these things on a case-by-case basis and set case-law as we encounter these things. For now, I only consider myself to have set a general policy on the topic of advocacy for violence in general terms, against large-ish groups of people. For that, the policy is that that kind of speech is allowed on LessWrong, since it's better to happen out in the open, and to be able to create common knowledge of the pursuit of such plans being a bad idea in almost all circumstances, and the kinds of reasons I gave in my previous moderation comments.

^{^}
But like, if a group of volunteers was starting to coordinate how to join and assist the Ukraine war, or even start discussing taking specific violent actions in the middle east on any side of the active conflicts over there, I think that would be fine? Obviously I would do things if that ends up threatening to have a big cultural effect on the site, but if it's in some random corner of the site, I probably wouldn't stop them, and the associated discussion seems like it could be quite insightful and help clarify various difficult ethical issues.
^{^}
While I do not in general want to defer moral authority or content moderation decisions to U.S. law, the rough U.S. thresholds for what constitutes illegal violent speech seem helpful here. In the U.S. crossing the line between legal advocacy and illegal incitement requires the call to be for imminent violence, and for the call to succeed, and I would probably apply similar standards here.
^{^}
Like in some absurd world where someone puts some kind of violent attack as an event on the LessWrong event map, or uses the LessWrong wiki to coordinate on specific plans
^{^}
The section of the downvoted comment you screenshot does not have this, but in e.g. that user's bio they currently have written that they are actively looking for "cofounders" to do some cybercrime stuff, and encouragement to reach out to them in private, and my guess is that crosses my line for what I want people to do on LessWrong, so we will probably ask them to edit that, but I haven't yet fully thought through all the precedent and implications of that.

Thank you. It appears that I have indeed misunderstood your position. I apologize for that and I apologize for the fallout.

There are clearly some things that would be over the line. If someone posts a comment being like "I will show up to <company office X> tomorrow and firebomb them, show up if you read this and want to participate", I would very likely take it down (and also report it to the police and share what info I have on who wrote it)

Very happy to hear this.

My two moderation comments on the issue do not read to me as implying this kind of dichotomy.

To clarify why I interpreted them this way: a comment mentioned "discussing or advocating violence is not banned on LessWrong", with the only mentioned exception to that being only the usual moderation rules^[1]. I expected that the general policy is "to permit calls for violence on the site, with the expectation that they will get heavily downvoted^[2], and that this will be better at deterring future people from making similar calls", without exceptions that depend on how broad the targets of calls for violence are.

we e.g. deactivated his ability to send DMs on LessWrong

Apologies, I did not have this context^[3]. Thank you.

I'm curious why this particular comment was not over the line/wasn't deleted, if you want to share. It did not strike me as advocating violence only in general terms, and people could easily get in touch with the author by looking at the contact details on his profile.

...if a group of volunteers was starting to coordinate how to join and assist the Ukraine war
...People should be able to discuss whether or not to commit violence against large groups of people, like asking whether now is the time to have a revolution against the government
... Using LessWrong to actually coordinate or execute plans... if somehow the people doing that had gotten community buy-in on that being a good idea

Yep, I agree all of those should be fine on LessWrong (and actually I think they're all fine on Twitter or Telegram).

I think it's would be good to not allow using LessWrong as a platform for finding people who want to coordinate and execute plans of this type for which there isn't a community buy-in, even when the coordination itself happens outside LessWrong.

that user's bio they currently have written that they are actively looking for "cofounders" to do some cybercrime stuff, and encouragement to reach out to them in private, and my guess is that crosses my line for what I want people to do on LessWrong

Also great to hear this.

^{^}
[that still apply if someone's being a dick or making the discussion go off the rails]
^{^}
"Heavily downvoted" is also the language I used in the tweet; I hope you can see that it wasn't transparent to me that this would not work describe the general situation with the karma of such content when I made the tweet.
^{^}
I did not expect this, given I reported his DMs but wasn't told of any action taken.

Since that thread was written, I've thought more about this, had significant discussion about this genre-of-policies in non-LW-related contexts, and learned more about the shape of the actual information environment.

I'm basically not at all worried about people advocating for individual violence on LW and successfully convincing people. The arguments against it are strong, the LW audience is smart, and on the few occasions where it comes up, there doesn't seem to be a shortage of people eager to write the counter-arguments. I am worried about people concluding, incorrectly, that other people are secretly more sympathetic to violence than they outwardly appear. I think that visible censorship would tend to create that false impression.

I also notice that mob-reactions to bad comments are often, from my perspective as someone who occasionally likes to put on an investigator hat, pretty directly opposed to what I want as a person wearing an investigator hat (and by extension opposed to the public interest as I understand it). If someone posts something bad, and people demand that it be deleted, I view that as analogous to demanding the destruction of police body-cam footage. And if someone posts a call for violence and people demand that they be blocked from commenting further, I view that as analogous to demanding that they be prevented from speaking without their lawyer. It's much better to let people incriminate themselves, or clear themselves, if that's what they're going to do.

If someone posts something bad, and people demand that it be deleted, I view that as analogous to demanding the destruction of police body-cam footage

I think this analogy is stretched beyond usefulnesss - there's a very wide gap between "removal of publishing" and "deletion of legal evidence". There's a LARGE class of body-cam footage that never gets made public, but is not irrevocably deleted. Private website operators also have very different duty and liability shapes than police departments do.

It's not as bad, quantitatively speaking, but it's the same kind of bad; both are steps to prevent incriminating information from being seen.

Thanks! I think I mostly agree with what you're optimizing for. Some comments:

On why delete: I'm not particularly worried about people convincing LW users to commit violence. I'm worried about people who are much more willing to commit violence than approximately all LW users finding each other via LW.

On evidence of common knowledge: as I mentioned in the post, people would expect self-censorship, especially from senior community members, and so I'd be worried about people still incorrectly concluding that others are secretly more sympathetic to violence than they outwardly appear, even in the absence of rules prohibiting specific calls for violence.

On preserving evidence: I'm very sympathetic to it, and also very sympathetic to having police officers wear body-cams. I think this does not require the comments to exist under schelling-pointy posts about violence, though? E.g., one way to do this would be to remove a comment from the post, but keep it available (or even actively reported) to law enforcement, and available to other users (established users?) in the moderation log.

On speaking with a lawyer: I think automod already somewhat prevents people from speaking further without their lawyer; but also I'm not sure how to best balance being a place where people can incriminate themselves and preventing people like that from finding collaborators using the platform. (Or are you worried that established users would err on the side of speaking with a lawyer even when not actually warranted?)

I think this particular policy's first order impact rounds down to zero but I support libertarian moderation because I would prefer the site be less not more influenced by European speech laws and people who consider them just.

Moderation is among the topics where one size does not fit all - policies for larger and more general-purpose sites will be radically different that a relatively focused one like LessWrong.

LessWrong can be MUCH less legible and more nuanced in its application of moderation, and I'm very happy that Oliver and the others realize it. I very much hope it's a long time before we need bright-line blanket bans because the mods can't keep up or because it gets so much attention that we feel any borderline non-mainstream opinions are harmful.

I was unable to tell from this post what your actual proposal was, or if you had any complaints or suggestions for change from the status quo. Personally, I have seen no moderation decisions that I strongly disagree with or that worry me.

bright-line blanket bans because the mods can't keep up

How similar are they to wholesale rejection of obviously sloppy posts which the mods had to automate?

Quite different, I think. suppression based on domain or ideas is just an independent dimension from quality or precision of presentation.

Thanks for starting the discussion!

My thoughts:

"Displaying the number of or karma from upvotes or downvotes on hover instead of just the absolute number of votes" is good idea anyway
Additional for allowing:
- Slippery slope into censorship.
- Litany of Tarski. If violence works, etc.
Additional against allowing:
- Someone can screenshot the call to violence without karmameter, bad PR.
- Epistemic Prisoner's Dilemma. We would want people who think something like "If no one builds it, everyone dies" to forbid calls to violence against pro-pause side in their communities.

"Displaying the number of or karma from upvotes or downvotes on hover instead of just the absolute number of votes" is good idea anyway

We considered this over the years, but I currently think this is too de-anonymizing, and also introduces some kind of weird social dynamics that I think would be on the margin worse. I do think it's a tricky call!

Someone can screenshot the call to violence without karmameter, bad PR.

But then someone can screenshot the call to violence with the karmameter, which is good PR. I think this is overall kind of tricky. I have been thinking about making it so that highly downvoted comments and posts are harder to screenshot while omitting the karma, but I haven't found a UI that doesn't look completely atrocious that achieves that.

We would want people who think something like "If no one builds it, everyone dies" to forbid calls to violence against pro-pause side in their communities.

Maybe I am failing to parse this, but I much prefer for calls to violence in other communities towards me to also face clear pushback, and to be public (since then I can orient to them and take precautions), so I don't really see the prisoner's dilemma here. Indeed people threatened by the potential violence are the people who most would like to know about the threat!

I currently think this is too de-anonymizing

Ok. But no need to show exact information if the point is to distinguish karma +950 -1000 from karma +0 -50.

Maybe display approximate ranges of upvotes/downvotes? Kinda like creature quantities in HoMM. "Swarm (250-499) liked, horde (50-99) disliked".

Or even just show what fraction of overall karma was in other direction rounded to 10% (or 20%, or even 25%).

introduces some kind of weird social dynamics that I think would be on the margin worse

Would be moderately interested in elaboration.

But then someone can screenshot the call to violence with the karmameter, which is good PR.

I think "there are bad people in our community, but the overwhelming majority of us don't like what they say" isn't usually used as PR. I suspect it's because it wouldn't actually work.

Maybe I am failing to parse this, but I much prefer for calls to violence in other communities towards me to also face clear pushback, and to be public (since then I can orient to them and take precautions), so I don't really see the prisoner's dilemma here. Indeed people threatened by the potential violence are the people who most would like to know about the threat!

I'm not sure about this in any direction. But it seemed missing from Samin's point of view.

Or even just show what fraction of overall karma was in other direction rounded to 10% (or 20%, or even 25%).

I am confused about this. Isn't this information already readily available as a result of the vote count? There isn't that much variance in vote strengths. If you see something at 2 karma with 40 votes, you know that it must have gotten something like +40 and -38.

I find Habryka's logic in the quotes much more compelling than OPs.

To restate: if you ban discussion, people will assume they are correct that violence works and that nobody is willing to discuss or even think about it. Allowing that discussion therefore seems much more likely to prevent than cause violence, by showing that the vast majority of the alignment-concerned oppose the use of violence, and exactly why we do - on both ethical and practical grounds.

This is particularly true if you are pretty sure that violence does not work.

I think you can see many examples of poorly considered people concluding that because discussion of their ideas is banned, they can assume they're true.

I think this is also true of reputational damage to the community. Showing that calls for violence have been met with strong counterarguments and strong opposition is protective, not damning.

Your points about assuming secret community support are valid, but they become more plausible without public discussion.

if you ban discussion, people will assume they are correct

We know what happened the last time someone deleted a comment on LessWrong.

I like allowing it because it makes me happy when I see how downvoted they get.

If you want to be able to talk about matters of geopolitical or military importance, you have to be able to discuss war, people's reasons and justifications for war, the weapons of war, etc.

If you want to be able to talk about the good of humanity, you have to be able to talk about acts that some will see as preserving the good of humanity, and others will see as damaging or destroying it.

If you want to be able to talk about morals and agents, you have to be able to talk about some of the most morally significant behaviors that agents can engage in toward one another; which include violence, protection from violence, and so on.

If you want to be serious, you have to be able to discuss serious things.

And that means people will (and should) take positions on those serious things, not just discuss them in abstract.

This reads like you already had a conclusion in mind and searched for any way to support it while avoiding actually considering the counter arguments.

People sometimes hear about the risk from AI and assume violence is the answer. When they post on LessWrong about it, they get immediate and overwhelming feedback that this is not helpful at all. If calls for violence were banned, I think many of those people would assume that we all secretly support violence but don't talk about it. The fact that discussion of violence is allowed but that LW users are overwhelmingly opposed to it is a stronger signal than ambiguous silence. You mention this but don't take it seriously.

The biggest actual risk is that people take these comments out of context and make it sound like LessWrong users are in favor of violence. ~~I find it hard to take the author's concern here seriously since they started this argument by posting an out-of-context screenshot with the karma removed implying that LessWrong supports violence.~~ [See comments]

Also I don't really see the point of your proposed policy of allowing calls for violence as long as the users dance around the topic. You'd get exactly the same practical effect except mods have to adjudicate if the user was calling for violence or just implying a call to violence.

by posting an out-of-context screenshot with the karma removed implying that LessWrong supports violence

The tweet literally said “highly downvoted” And was not in any way “out-of-context”. It did not imply LessWrong supports violence; I’ve made many comments explaining that the opposite is the case.

I think it's a false dichotomy to either allow all discussion of violence, including specific calls for killing specific people in a coordinated manner, or to not ever permit any discussion even of the kinds of situations where violence can be justified, at any degree of specificity.

This is true! And indeed, no such dichotomy has been proposed by me, so I think you must have misunderstood some of my comments here.

I don't have a great model about what the exactly correct lines here are, and am interested in what other people think good lines might be.

Tentatively, I would say that...

People should be able to discuss whether or not to commit violence against large groups of people, like asking whether now is the time to have a revolution against the government, and it's not possible to fully discuss that without allowing for advocacy.^[2]
Advocating for violence against highly specific targets in specific ways gets a lot dicier due to threat dynamics, and my guess is I would at the very least ask people to first make an argument for a broader reference class of plans that reads as less directly threatening.
Using LessWrong to actually coordinate or execute plans^[3], would only be fine if somehow the people doing that had gotten community buy-in on that being a good idea (which e.g. doesn't seem implausible for something like a Russian resistance movement, or providing assistance to the Ukrainian war, but seems quite unlikely for any extrajudicial violence in any western country without very drastic changes in the political climate).^[4]

^{^}
But like, if a group of volunteers was starting to coordinate how to join and assist the Ukraine war, or even start discussing taking specific violent actions in the middle east on any side of the active conflicts over there, I think that would be fine? Obviously I would do things if that ends up threatening to have a big cultural effect on the site, but if it's in some random corner of the site, I probably wouldn't stop them, and the associated discussion seems like it could be quite insightful and help clarify various difficult ethical issues.
^{^}
While I do not in general want to defer moral authority or content moderation decisions to U.S. law, the rough U.S. thresholds for what constitutes illegal violent speech seem helpful here. In the U.S. crossing the line between legal advocacy and illegal incitement requires the call to be for imminent violence, and for the call to succeed, and I would probably apply similar standards here.
^{^}
Like in some absurd world where someone puts some kind of violent attack as an event on the LessWrong event map, or uses the LessWrong wiki to coordinate on specific plans
^{^}
The section of the downvoted comment you screenshot does not have this, but in e.g. that user's bio they currently have written that they are actively looking for "cofounders" to do some cybercrime stuff, and encouragement to reach out to them in private, and my guess is that crosses my line for what I want people to do on LessWrong, so we will probably ask them to edit that, but I haven't yet fully thought through all the precedent and implications of that.

Thank you. It appears that I have indeed misunderstood your position. I apologize for that and I apologize for the fallout.

There are clearly some things that would be over the line. If someone posts a comment being like "I will show up to <company office X> tomorrow and firebomb them, show up if you read this and want to participate", I would very likely take it down (and also report it to the police and share what info I have on who wrote it)

Very happy to hear this.

My two moderation comments on the issue do not read to me as implying this kind of dichotomy.

we e.g. deactivated his ability to send DMs on LessWrong

Apologies, I did not have this context^[3]. Thank you.

...if a group of volunteers was starting to coordinate how to join and assist the Ukraine war
...People should be able to discuss whether or not to commit violence against large groups of people, like asking whether now is the time to have a revolution against the government
... Using LessWrong to actually coordinate or execute plans... if somehow the people doing that had gotten community buy-in on that being a good idea

Yep, I agree all of those should be fine on LessWrong (and actually I think they're all fine on Twitter or Telegram).

that user's bio they currently have written that they are actively looking for "cofounders" to do some cybercrime stuff, and encouragement to reach out to them in private, and my guess is that crosses my line for what I want people to do on LessWrong

Also great to hear this.

^{^}
[that still apply if someone's being a dick or making the discussion go off the rails]
^{^}
"Heavily downvoted" is also the language I used in the tweet; I hope you can see that it wasn't transparent to me that this would not work describe the general situation with the karma of such content when I made the tweet.
^{^}
I did not expect this, given I reported his DMs but wasn't told of any action taken.

If someone posts something bad, and people demand that it be deleted, I view that as analogous to demanding the destruction of police body-cam footage

It's not as bad, quantitatively speaking, but it's the same kind of bad; both are steps to prevent incriminating information from being seen.

Thanks! I think I mostly agree with what you're optimizing for. Some comments:

Moderation is among the topics where one size does not fit all - policies for larger and more general-purpose sites will be radically different that a relatively focused one like LessWrong.

bright-line blanket bans because the mods can't keep up

How similar are they to wholesale rejection of obviously sloppy posts which the mods had to automate?

Quite different, I think. suppression based on domain or ideas is just an independent dimension from quality or precision of presentation.

Thanks for starting the discussion!

My thoughts:

"Displaying the number of or karma from upvotes or downvotes on hover instead of just the absolute number of votes" is good idea anyway
Additional for allowing:
- Slippery slope into censorship.
- Litany of Tarski. If violence works, etc.
Additional against allowing:
- Someone can screenshot the call to violence without karmameter, bad PR.
- Epistemic Prisoner's Dilemma. We would want people who think something like "If no one builds it, everyone dies" to forbid calls to violence against pro-pause side in their communities.

"Displaying the number of or karma from upvotes or downvotes on hover instead of just the absolute number of votes" is good idea anyway

Someone can screenshot the call to violence without karmameter, bad PR.

We would want people who think something like "If no one builds it, everyone dies" to forbid calls to violence against pro-pause side in their communities.

I currently think this is too de-anonymizing

Ok. But no need to show exact information if the point is to distinguish karma +950 -1000 from karma +0 -50.

Maybe display approximate ranges of upvotes/downvotes? Kinda like creature quantities in HoMM. "Swarm (250-499) liked, horde (50-99) disliked".

Or even just show what fraction of overall karma was in other direction rounded to 10% (or 20%, or even 25%).

introduces some kind of weird social dynamics that I think would be on the margin worse

Would be moderately interested in elaboration.

But then someone can screenshot the call to violence with the karmameter, which is good PR.

I think "there are bad people in our community, but the overwhelming majority of us don't like what they say" isn't usually used as PR. I suspect it's because it wouldn't actually work.

Maybe I am failing to parse this, but I much prefer for calls to violence in other communities towards me to also face clear pushback, and to be public (since then I can orient to them and take precautions), so I don't really see the prisoner's dilemma here. Indeed people threatened by the potential violence are the people who most would like to know about the threat!

I'm not sure about this in any direction. But it seemed missing from Samin's point of view.

Or even just show what fraction of overall karma was in other direction rounded to 10% (or 20%, or even 25%).

I find Habryka's logic in the quotes much more compelling than OPs.

This is particularly true if you are pretty sure that violence does not work.

I think you can see many examples of poorly considered people concluding that because discussion of their ideas is banned, they can assume they're true.

I think this is also true of reputational damage to the community. Showing that calls for violence have been met with strong counterarguments and strong opposition is protective, not damning.

Your points about assuming secret community support are valid, but they become more plausible without public discussion.

if you ban discussion, people will assume they are correct

We know what happened the last time someone deleted a comment on LessWrong.

I like allowing it because it makes me happy when I see how downvoted they get.

If you want to be able to talk about matters of geopolitical or military importance, you have to be able to discuss war, people's reasons and justifications for war, the weapons of war, etc.

If you want to be serious, you have to be able to discuss serious things.

And that means people will (and should) take positions on those serious things, not just discuss them in abstract.

This reads like you already had a conclusion in mind and searched for any way to support it while avoiding actually considering the counter arguments.

by posting an out-of-context screenshot with the karma removed implying that LessWrong supports violence