I think there's an important thing here and I appreciate the post and would like to see more discussion. The simple stances that people seem to take (whether they involve promoting enmity or not) usually / always so far seem bad, or at least very much not done being figured out. (Related: https://www.lesswrong.com/posts/dENfZBhCzsR8ggfpt/escalation-and-perception-1 )
Anyway, I want to push back on the basic point in this post. First, I think we can agree that enmity doesn't have to be symmetric, and often isn't. Second, I think that if A is treating B as an enemy and B is not treating A as an enemy, this may be bad for B compared to B treating A more as an enemy. I think that people who are promoting enmity would often claim to be in this position. In other words, they believe they are on team B, and currently team B is making a mistake by not sufficiently treating A as an enemy. While it seems true that promoting symmetric enmity pretty much always comes with a big cost (though it could in theory come with benefits too), which is approximately what I think you were saying, it doesn't seem true that promoting one-way enmity is necessarily coming with a big cost compared to the alternative.
In the background, I think enmity is multidimensional; even bitter enemies who generally seek to degrade the other's agency in general might play by various rules (e.g. banning chemical weapons), or be open to reconciliation in some longer time frame. So there's escalation / deescalation dynamics on many dimensions, and there's a delicate balance of how to avoid enmity as much as possible without opening yourself to exploitation ( https://www.lesswrong.com/posts/34mDRmAbfkaMfoAcR/a-prayer-for-engaging-in-conflict ).
While it seems true that promoting symmetric enmity pretty much always comes with a big cost (though it could in theory come with benefits too), which is approximately what I think you were saying, it doesn't seem true that promoting one-way enmity is necessarily coming with a big cost compared to the alternative.
On my model, the large additional cost that it imposes is half the point. If you're in a situation of one-sided enmity, and you would rather not be, then an obvious thing to do is destructively seize leverage and better your BATNA by making that particular tragedy symmetrical. Then, your counterparty which was previously treating you as an enemy with impunity will now have a reason to come to the table (and you win) or be forced to engage in open public warfare when previously they may have been deniably catty (and you win by means of everyone losing).
On this model, it is if anything much worse when serious one-sided enmity gets started, or when one of the parties behaves so as to be indistinguishable from an enemy, or acts rashly and then tries to avoid paying repair costs. It's also especially bad when battle lines are unclear, such that from one reasonable point of view, person P is obviously on team T, and from another reasonable point of view, they're equally obviously not.
I of course agree that warning people about their enemies can be helpful to the people being warned.
I don't agree that's what Eliezer is doing with Amodei and Hegseth. Across multiple posts, he is promoting enmity toward both of them. Here he is saying that Hegseth's position is even worse than Amodei's: https://x.com/allTheYud/status/2027357747960766554?s=20
It's of course logically consistent to say two people have bad positions and that one is worse. The strange thing is to be doing that while performatively "helping" Hegseth by "warning" him that AI leaders would "discard him like used toilet paper". Taken in that context, the effect is more like symmetrically creating enmity between the two camps, as opposed to encouraging a positive resolution to the conflict.
I'll also reiterate that Eliezer is far from alone in this pattern of promoting enmity between leaders in and around AI. He merely offers an easy-to-analyze public example of the pattern.
Ok, thanks for the clarification. I think I see a bit more what you're getting at.
But, rereading Yudkowsky's tweet, I think your description
performatively "helping" Hegseth by "warning" him that AI leaders would "discard him like used toilet paper"
is not accurate. I think the group being addressed in the tweet is "political leaders of the world". Admittedly, the tweet is confusingly worded in this regard, and I had to reread it. In the beginning, he says
You have proven that you stand between AI labs and the nice thing they were getting for all their hard work.
and later
Sam Altman does not now look more powerful because you crushed his competitor. He looks less important because you, politicians, crushed his competitor, and did so in a way that made clear that Altman would have to take the orders of any Trump appointee as well.
I think he's using that as a rhetorical device to express "there's a conflict here between [AI leaders as a group] and [political leaders as a group]", because (he claims) that's how AI leaders are taking it. In this light, I think he's trying to alert B (political leaders in general) that A (AI leaders) are their enemy.
While it seems true that promoting symmetric enmity pretty much always comes with a big cost
I'm not sure about that at all. Suppose that:
Then promoting even symmetric enmity between the groups would be beneficial for Group B.
(Incidentally, note that even if neither party is actively malicious towards the other, "promoting enmity" wouldn't necessarily involve lying. Feelings of enmity can be fueled by pointing at Group A's negligence (or whatever flaws are causing them to mistakenly pursue the harmful-to-B activity), rather than by ascribing malicious intent to them. To a significant extent, it's about framing.)
Well, the parenthetical says "(though it could in theory come with benefits too)"; you could be making a good case for that happening in practice. My statement is just agreeing with the OP that there is a big cost pathway there--even in your example, there's a bunch of bad stuff that happens from increased symmetric enmity; e.g. it would probably become harder to help A see the error of their ways, the B team would probably end up doing a bunch of enemy stuff purely for the sake of enmity which doesn't actually help much, A might become B's enemy and then successfully smack down B, both sides would be spending resources on conflict rather than doing anything in the intersection of ~all human values (such as for example A spending more resources on thinking and getting a better understanding of how their plans are bad for the world / for B), etc. I think the OP is trying to say this narrow point, e.g.:
[...] I think those activities are increasing AI risk, including but not limited to extinction risk. However, that's a stronger claim than I intend to argue here. Rather, I'll just be presenting a simple and harmful causal pathway [...]
Can you moderate the promotion of enmity without escalating social violence?
If anyone can recall successful instances of this happening in AI, could you reply here with links? Would love to share and celebrate the role models
People who are attempting to cause serious harm need to be stopped. If someone is currently attempting murder, it's not reasonable to look for mutually beneficial arrangements with them, they need to be restrained and put in prison. I'm not open to peaceful coexistence with people who insist on building something that will likely destroy the human race. No compromise is possible, they simply can't be allowed to build it. If they won't stop voluntarily, they absolutely are enemies.
It seems to me that you're making a bucket error between "someone trying to murder is on the other side of a conflict", and "someone trying to murder is going to persistently keep trying to murder even if they obtain the things they say they want right now, because they at-least-somewhat-terminally value murder". Perhaps there are ways to say things in language that is conditionally sedate and does not attribute the behavior you want to stop to an identity feature that the accused person is likely to internalize? For example, if someone is told "by participating in [x common thing], you are a murderer", they seem more likely to consider themselves morally licensed to do other things that are called murder by angry people online. The argument isn't "they're not doing something that should be stopped", which is what I see you as responding to; it's "try not to write to their identity slot or to others' identity slot for that person, when accusing them of bad behavior".
"Mass manslaughterer" seems more accurate than "mass murderer" anyway, and might be lower on the scale OP describes.
Pushing toward ASI isn't actually a common thing, only a tiny fraction of people are doing that. I think it's unlikely, but if my words cause someone to quit pushing toward ASI but feel morally licensed to do other bad things, I think I'd still consider that a win, since pushing toward ASI is one of the worst things a person can do. I think there are people out there who at-least-somewhat-terminally value murder, but are prevented from committing murder by moral disapprobation and threat of punishment, so it's important not to push those things outside the Overton window for fear of causing bad vibes.
I'm more worried about the case where your words not only don't stop them from pushing towards ASI, but make them feel that in their pursuit of ASI, since they are now intended by you to consider themselves a bad person for doing it, they should think of themselves as being intentionally evil, and take other intentionally evil actions; or, someone else who would have tried to get them to stop by talking to them, now thinks of them as impossible to talk to, and treats them as beyond reach of communication and request. There's a space between "say they're doing a bad thing" and "say they're inherently a kind of person who is inclined to do bad things", or "say they're impossible to pressure via ordinary means and must be pressured via unusual means". I am not asking you to say what they're doing is fine, and I would understand if the split that those-who-agree-with-OP are asking you to make wasn't a split you were previously treating as notable.
Also, for the record, I'd volunteer my time to talk with anyone who is currently doing capabilities research at an AI research group, or who is seriously considering doing that, and try to explain why they shouldn't but in a way that is kind and open and understanding. (I don't think I have a legible track record of doing this, but I would unaccountably claim that there's a significant chance I'd be good at this for a substantial subset of such people.)
Also, for the record, I'd volunteer my time to talk with anyone who is currently doing capabilities research at an AI research group, or who is seriously considering doing that, and try to explain why they shouldn't but in a way that is kind and open and understanding. (I don't think I have a legible track record of doing this, but I would unaccountably claim that there's a significant chance I'd be good at this for a substantial subset of such people.)
There's this one Upton Sinclair quote I think about a lot in this context. I imagine you've seen it?
Chewing it over more, I think you may have neglected to consider Newcomblike self-deception as a possible factor in Sinclair's razor. It's not necessary for the person to be lying about what they believe, or for them to have consciously convinced themselves of that lie. They can just have a big convenient cognitive blind spot.
Well, I meant that to be included under "has really convinced themselves", where you're proposing that they could have convinced themselves unconsciously. (Which I agree happens, via a bunch of little ugh fields and piecemeal distorted-world construction.)
Feel free to make an edit to clarify though, it's a wiki!
I would rather say that you should stop them in whatever way is most effective. If a peaceful compromise gets them to stop, do that. If forcing them to stop works better, do that. There is nothing morally wrong with trying to stop someone who's trying to kill you, but that doesn't mean it's your best strategy.
If someone is currently attempting murder, it's not reasonable to look for mutually beneficial arrangements with them, they need to be restrained and put in prison.
This is locally true, but also I want to point out similar situations where the reasoning isn't quite the same:
I'm not open to peaceful coexistence with people who insist on building something that will likely destroy the human race.
I myself would support laws imprisoning people selling out humanity and getting rich in the process (by building AGI via ML scaling), but I think violence is a really important civilizational schelling line and very dangerous to cross, and, while I admit I am still confused about the lines here, I am uncomfortable reading this. Your comment pattern matches to me with a pretty naive escalation across ethical lines, rather than a considered and mature one. You're vaguely threatening violence almost reactively, rather than making any case that it would improve things or that this is one of the rare times where it's okay to cross the line. This rhymes for me with a childish, impotent desire to have power over people who are doing terrible things, which is understandable, but doesn't just any means of doing it.
I was vague because I don't think it's actually prudent to threaten anyone at this time, but I do think it's important to defend the possibility of talking about it. Of course I know it's possible to be counterproductively aggressive, but I guess I'm getting a little edgy because my sense is that almost all people reading LW err on the side of being way too conflict-avoidant. Being ready and willing to fight (including nonviolent resistance) can have advantages even if no fighting actually occurs, but it requires, among other things, being able to identify enemies.
People who are attempting to cause serious harm need to be stopped.
This is such a weird phrasing, because it attributes the need to the person it's proposing to stop, instead of to the people who want them stopped.
Consider: If Julius Caesar is attempting to cause serious harm to the Gauls, it's not Caesar who needs Caesar to be stopped; it's the Gauls who need Caesar to be stopped.
I used the passive voice because identifying who does the stopping isn't directly relevant to the topic of whether it's good or bad to promote enmity.
Why is this so downvoted (in addition to upvoted)? It makes clearly stated interesting points on an important topic. Argue or update.
Simply put, when groups of humans and/or machines have a high prior that they need to destroy each other in order to achieve their goals, they are more likely to do that than if they have a high prior on being able to find mutually beneficial arrangements. And, there are things you can do to increase or decrease that prior.
To be fair, I don't think that Eliezer's tweet to someone called the Secretary of War is the biggest contribution to anyone's impression that destroying each other is a way to accomplish goals. I am not sure what the issue is and I think Yud's position is roughly correct, though I also worry about getting the admin into this because my model of them is that they would just be the kind of guys who want the eldritch power for themselves, confident that they're just Born Different and won't be eaten by it like everyone else.
It seems like the point of his post was specifically to promote enmity, which is usually bad. In this case maybe he thinks it's good because if AI companies and the government are fighting each other it could increase the likelihood the government does something to slow down AI capabilities? I can't rule that out, but it seems like some 4d chess that I wouldn't want to meddle in. Things can change quickly, and none of us fully understand the consequences of our words and actions. So maybe what he's saying is true, but saying true things in a confrontational way is not always helpful. In this case I think if his words have any meaningful effect, it will likely be to galvanize governments to limit safety measures by AI developers, in the name of sovereignty.
I'm not sure what are the safe plays at this point, especially if you think that existential risk is on the line. But I also don't think almost any of them are like, completely devoid of "enmity". Enmity makes people act hard and fast. And it would in fact be legitimate to feel enmity for someone who is risking your life or trying to take away your position of power by scheming. I don't think it's good to be some kind of Machiavellian schemer as a first resort but I also don't think you can or should confront problems on this magnitude and with these stakes while tip toeing around to make sure that you never suggest anyone should be anyone else's enemy. At some point, you call a spade a spade. If you can't bring yourself to saying that maybe the people who want to do the thing that kills everyone are the enemy, no one is going to believe you actually think you're really as concerned as you say you are.
Can you moderate the promotion of enmity without escalating social violence?
Yep, I'm pretty sure that's doable.
Hmm, it seems like a more important question is whether it's possible to do so without promoting and optimizing for something other than (individual or collective) truthseeking, at least sometimes. I don't think it's always possible to do so.
To use a different example than the one you picked (AI leaders <-> DoW leadership), consider the relationship between Dario Amodei and Sam Altman:
https://x.com/sama/status/2019139174339928189
https://www.cnbc.com/2026/02/19/openai-sam-altman-anthropic-dario-amodei-india-ai-summit.html
I think it is indeed obvious to anyone who has been paying close attention and knows the history that their relationship very likely goes beyond mere conflict and competition. So to avoid "promoting enmity" on this topic / relationship, I would need to think about the considerations in this post, and potentially change my words and actions based on that thinking, before speaking in a group setting or posting on social media about it.
I think these considerations will necessarily be distortionary in many cases, and that this distortion is bad for at least two reasons:
I've observed some people engaged in activities that I believe are promoting enmity in the course of their efforts to raise awareness about AI risk. To be frank, I think those activities are increasing AI risk, including but not limited to extinction risk. However, that's a stronger claim than I intend to argue here. Rather, I'll just be presenting a simple and harmful causal pathway and some strategies that can be used for mitigating it:
PromotingEnmity → Conflict → Catastrophe (PE→C→C)
(Enmity is not the same as conflict, which can sometimes can be constructive. Parties in conflict can be quite focussed on finding a mutually beneficial solution, even if that solution is difficult to find. By contrast, enemies do not generally pursue positive trade relations with each other. So, enmity is particularly relevant to watch out for when pursuing a positive future.)
Promoting enmity
Suppose groups X and Y are in a tense and dangerous relationship for some reason. If I say "Obviously X Leader and Y Leader hate and want to destroy each other", I'm promoting the hypothesis that they're enemies, and if they believe me, I might also be making it a bit more likely that they'll become or remain enemies.
In short, promoting enmity means promoting enmity to attention in ways that make enmity more likely. It's a kind of hyperstition.
The enmity doesn't have to be toward or from the speaker, so it's not necessarily like "hate speech" in that way. But as with hate speech, promoting enmity between groups is particularly consequential, and often avoidable. Even if you fastidiously avoid lying, even when you confidently believe enmity is present, you can still make choices to avoid promoting enmity, by deciding how much, how often, and where to bring it up.
Examples
Here are some increasingly intense examples of promoting enmity, with some intensity labels:
not promoting enmity: Alice is asked privately by a member of Group Z whether the leaders of Groups X and Y hate each other. She responds, "I'm not sure" or "I'd rather that be a question for them than for me."
minimally promoting enmity: Alice once tells a colleague unconnected to X and Y, "I think X Leader and Y Leader basically hate each other."
weakly promoting enmity: Alice tells a few colleagues connected to X and Y, "I think X Leader and Y Leader basically hate each other."
moderately promoting enmity: In a group meeting involving X and Y, Alice says "Well, obviously X and Y want to destroy each other if they can."
strongly promoting enmity: In a high-profile social media post, Alice says "X Leader, make no mistake, Y Leader hates you and wants to destroy you."
If people are already convinced the enmity between X and Y is present, such that Alice's promotion of it doesn't have much marginal impact, I'm still calling the fourth level strong, because it's strong relative to Alice's other options for how much emphatically to assert the enmity.
Is anyone actually promoting enmity like this around AI?
I think a bunch of people are doing this to some degree. Hundreds maybe? Activism in particular seems prone to promoting enmity, because dramatic stories about conflict between enemies attract attention, and are thus quite sticky as means of "raising awareness". Many of my observations here are from private or semi-private conversations with AI safety activists, which would not be polite to call out publically.
That said, to clarify I'm not simply imagining this pattern, here is a public tweet from Eliezer Yudkowsky where he claims to Secretary of War Pete Hegseth that AI company leaders would
https://x.com/allTheYud/status/2027560852048458120
Personally I don't think the company leaders would do that. But irrespective of that, it's interesting to note that Eliezer was met with very little criticism for promoting enmity here. I'm not sure why that is. There were posts disagreeing with him, but no highly ranked responses about how this was plausibly a bad thing for Eliezer to say even if he believes it. In particular, none of the top-ranked replies to the post were people saying, "Whoah there, are you sure it's helpful to promote enmity between military leaders and AI developers like this?".
In such situations, I think it's important to consider the sorts of equilibria that speech encourages. This shouldn't be the only important consderation when choosing speech, but it is a consideration.
How can promoting enmity increase AI risk?
Simply put, when groups of humans and/or machines have a high prior that they need to destroy each other in order to achieve their goals, they are more likely to do that than if they have a high prior on being able to find mutually beneficial arrangements. And, there are things you can do to increase or decrease that prior.
Can you moderate the promotion of enmity without escalating social violence?
Yep, I'm pretty sure that's doable. Here are some example responses:
"I don't think it's helpful for the world when you promote enmity around AI like this; it pushes for bad equilibria between people and groups."
"What you're saying here seems more like it's promoting enmity than trying to resolve or address conflict."
"I think there are better ways to address and resolve conflicts between people, versus what you've said here, which seems more escalatory than helpful."
Moderation vs tone-policing
One way moderation can backfire is if you personally escalate negative hyperstition or threats beyond what is already present in the conversation. "Tone policing" is a useful label for this.
On the other hand, if you try to gently moderate negative hyperstition, like promoting enmity, you might still be accused of tone-policing. In that case, you can at least offer the following pushback:
Closing thoughts
In simple terms, promoting enmity creates create a bad vibe around AI where groups of humans and/or AIs are more likely to hate each other and employ their capabilities to destroy each other and/or the world. And, there may be things we can do to moderate or de-escalate such bad vibes.
What's a "bad vibe"? Here I just mean a heightened Bayesian posterior that other parties are acting in bad faith, i.e., are not open to peaceful coexistence or mutually beneficial relations. In purely utility-theoretic example, if Alice becomes convinced that Bob's utility function is the negative of Alice's, she will not have much hope in searching for Pareto-positive outcomes with Bob.
It's hard to judge how much any given statement will promote enmity around AI, and whether the causal pathway «PromotingEnmity → Conflict → Catastrophe» is outweighed by other beneficial causal pathways from open discourse. But sometimes you can get the good without the bad. So, my goal in writing this post has been to draw a bit more attention to the potentially harmful effects of promoting enmity, and some ways to avoid or mollify those effects.