Alignment to Evil

Matrice Jacobine

LESSWRONG
LW

Alignment to Evil — LessWrong

64 Alignment to Evil

by Matrice Jacobine

21st Feb 2026

2 min read

64

This is a linkpost for https://tetraspace.substack.com/p/alignment-to-evil

One seemingly-necessary condition for a research organization that creates artificial superintelligence (ASI) to eventually lead to a utopia1 is that the organization has a commitment to the common good. ASI can rearrange the world to hit any narrow target, and if the organization is able to solve the rest of alignment, then they will be able to pick which target the ASI will hit. If the organization is not committed to the common good, then they will pick a target that doesn’t reflect the good of everyone - just the things that they personally think are good ideas. Everyone else will fall by the wayside, and the world that they create along with ASI will fall short of utopia. It may well even be dystopian2; I was recently startled to learn that a full tenth of people claim they want to create a hell with eternal suffering.
I think a likely way for organizations to fail to have common good commitments is if they end up being ultimately accountable to an authoritarian. Some countries are being run by very powerful authoritarians. If an ASI research organization comes to the attention of such an authoritarian, and they understand the implications, then this authoritarian will seek out control of the future activities of the organization, and they will have the army and police forces to attain this control, and, if they do solve the rest of alignment, the authoritarian will choose the ASI’s narrow target to be empowering them. Already, if DeepSeek and the Chinese government have a major disagreement, then the Chinese government will obviously win; in the West, there is a brewing spat between Anthropic and the US military regarding whether Anthropic is allowed to forbid the US military from using their AI for mass surveillance of Americans, with OpenAI, xAI and Google seemingly having acquiesced.
Therefore, even if progress towards ASI is shut down, there doesn’t seem to be a very good off-ramp to turn this advantage into utopia. The time bought could be used to set up an ASI Project that is capable of solving alignment, but this Project could be captured by authoritarians, and so fail to be committed to the common good, leading to not just extinction but dystopia. Any shutdown would likely be set up by governments, and so the terms of any graceful off-ramp would be up to governments, and this does not leave me cheerful about how much of a finger authoritarianism will have in the pie.

AI arms raceAI Development PauseAI GovernanceChinaRisks of Astronomical Suffering (S-risks)AI

Frontpage

64

New Comment

12 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:08 AM

[-]Seth Herd4d12-7

The section you've quoted here appears to equate authoritarianism to evil. I think it's fair to say that authoritarianism isn't the greatest good, but I don't think it's fair to say that it's automatically evil. An authoritarian who happens to value people's well-being substantially would seem to produce a future nearly as good as the future produced by your preferred utilitarian calculation.

Even a tiny amount of interest in other people's well-being can result in huge amounts of that well-being when you don't have to go to any personal effort, just tell your ASI do it.

Define well-being however you want, and the fraction of effort devoted to that is the fraction of the maximum possible good outcome by your definition.

The interesting thing about AGI is that an authoritarian in charge of it can have whatever they personally want, while also giving other people what they want, to the extent those don't conflict. If this authoritarian wants to be worshiped but not to actually determine most of people's actions day to day, those people can have most of what they want while the authoritarian can have all of what they want.

This is a complex argument, but I tend to think that anyone with a positive empathy versus sadism balance would tend to become a more generous person over time once they wield unlimited power. It seems like someone with that much power would be prone to understand the situation increasingly better over time, and have to weigh their sadism against their empathy.

I'm not saying this is a solid argument; I'd like to develop it, because I think it might matter a lot for the reasons you describe. We are all too likely to have our ASI controlled by an authoritarian of some stripe.

[-]Adele Lopez3d110

Even a tiny amount of interest in other people's well-being can result in huge amounts of that well-being when you don't have to go to any personal effort, just tell your ASI do it.

Even a tiny amount of interest in other people's suffering can result in huge amounts of that too. I saw recently that over 10% of people think that if hell does not exist, that we should create it.

This is a complex argument, but I tend to think that anyone with a positive empathy versus sadism balance would tend to become a more generous person over time once they wield unlimited power. It seems like someone with that much power would be prone to understand the situation increasingly better over time, and have to weigh their sadism against their empathy.

It seems clear that empathy and sadism are easily compartmentalized, in that it's natural to have high empathy for people within the moral circle yet feel sadistically to those outside it. So I don't think "empathy vs sadism balance" really matters at all, and that the question is almost entirely about where they draw the moral circle. And that seems to be either about their ability to be swayed by abstract arguments, social pressure, and hostility. Since they can control social pressure, that avenue isn't relevant. It also seems very hard to persuade people to expand the moral circle to people hostile towards them, which is probably a lot of people if you're an authoritarian. Better hope you get your enlightened philosopher-king! When was the last time we even had one of those, Lee Kwan Yew?

Also, historically, increased power is correlated with increased sadism. People have this weird intuition that becoming more powerful would (at least in many cases) lead to them being more "enlightened". I think it comes from imagining yourself being more powerful and thus having more affordance to think and care about things that weren't salient before. I'm sure that would be true for us, but in practice it almost always seems to make people worse.

[-]Seth Herd3d2-1

Right. I'm thinking of a situation with a permanent dictator who has lifetimes or aeons to learn if they aren't totally unwilling. I'm hoping that over time their circle would expand through abstract arguments primarily from their interactions with their ASI. And I'm very far from sure what types of people would expand their moral circle and which ones just never would.

I wouldn't call this an intuition. I don't think people become more enlightened with power; I think they become more enlightened, on average, with knowledge. That's sharply limited by their fear and resentment, and by their biology; sociopathy is real although it's probably not a clean zero empathy thing.

I guess the main point is that this is a pretty deep psychological question, and no human in history has been in a position of power as complete and secure as an ASI-enabled dictator would be.

If your point is that this would be pretty terrible in the short term, yes. And the other options look pretty bleak too.

[-]Thane Ruthenis3d100

I don't expect an AGI dictator to remain fully psychologically human for an extended period of time ("lifetimes or aeons"), so I wouldn't bet much on weird quirks of the human psyche saving us all. Like,

I'm hoping that over time their circle would expand through abstract arguments primarily from their interactions with their ASI

This assumes that there would be a "dictator and their ASI" split, as opposed to the dictator merging the ASI's capabilities into their own mind; this assumes the dictator would decide to retain the quirk of the human mind where humans' values could be rewritten by external abstract arguments; this assumes that the dictator would keep humans alive long enough for their moral circle to have a chance to expand and for that to do anyone any good; etc.

It also seems to assume that the moral circle would monotonically expand, as opposed to shrinking (e. g., as the result of the dictator growing increasingly callous playing with dolls) or oscillating. I wouldn't even be confident about that for fully psychologically human people.

[-]Seth Herd2d40

Interesting.

I'm definitely not confident in any of this. I do think that these questions aren't being asked and may wind up being a rather large part of strategy questions on AGI.

If it were me, I'd want to carefully conserve most of how my mind worked while expanding carefully. But I'm not sure what that would mean for how my ethics might change over time.

[-]Thane Ruthenis2d42

If it were me, I'd want to carefully conserve most of how my mind worked while expanding carefully

Sure, but if you have an aligned ASI tool at your hands, that still doesn't preclude radical or rapid changes (at least, rapid on the relevant "incidental expansion of the moral circle" scales). And even if those new types of changes are slow, I think that is still likely to break whatever subtle dynamics may arguably enable gradual moral-circle expansion in humans.

[-]Seth Herd2d42

I don't know. If so, it's likely to also break any subtle dynamics that contract the moral circle. Right? It seems like it's moving into territory nobody has thought much about

[-]Thane Ruthenis1d20

Yep. But my guess is that it would be a chaotic process, and that outcomes we'd consider acceptable are a narrow target, so on-expectation this would result in a (hyper)existential catastrophe.

[-]Seth Herd1d40

I'd expect that for someone with very low priority on being good to other people. Would you expect this also for a fairly good but imperfect person?

I'd at least expect that someone who was pretty good - like most of the nicer folks around here - would take steps to prevent their morality changing enough that their current self would be horrified by their later one. So if they had human flourishing as a fairly high priority, it would probably remain one.

It seems like this would happen if human flourishing were their first priority; if it were second or third to other very different ones, I'd think it less likely to survive intact, but satisficing multiple preferences might preserve a lot of human flourishing if someone wielded that sort of power.

I haven't been able to find anyone really exploring this logic. I suspect it's out there somewhere.

[-]Thane Ruthenis15h40

I'd expect that for someone with very low priority on being good to other people

Oh, I was assuming we're talking about that scenario specifically, yes: whether an initially-tyrannical person would end up pro-eudaimonia after lifetimes/aeons of godhood.

I'd at least expect that someone who was pretty good - like most of the nicer folks around here - would take steps to prevent their morality changing enough that their current self would be horrified by their later one

Well, now this becomes a question of competence, not their alignment to human values. If we assume that a good-but-imperfect person who considers human flourishing a high priority ends up in control of an AGI, and that they're careful enough not to accidentally lose control or self-modify themselves into insanity, then sure, the end result will probably be fine. (Depends on the operationalization of "good but imperfect", though.)

I haven't been able to find anyone really exploring this logic

@TsviBT pondered similar questions here, not sure if you saw that?

[-]lilkim20252d1-4

. I saw recently that over 10% of people think that if hell does not exist, that we should create it.

I think that's the Lizardman Constant. I would not be surprised if 10 percent of people replying to a survey pick the strangest or most deranged option just to mess with the surveyor. The fact that this number is the same between the relatively individualist US, the relatively conformist UK, and the relatively fundamentalist Pakistan make me lean towards "let's mess with this silly grad student" being the common motivation.

[-]anaguma5d1-2

Therefore, even if progress towards ASI is shut down, there doesn’t seem to be a very good off-ramp to turn this advantage into utopia.

Not all good futures need to result in utopia.

Moderation Log