On May 1, 2033, humanity discovered that AI was fairly easy to align.

Yitz

11 On May 1, 2033, humanity discovered that AI was fairly easy to align.

18th Jun 2025

2 min read

11

Not most AI models on the market, to be clear, but a specific type of AI that was already well-studied could act as a seed: the LLM. Really, all that was needed to achieve alignment was simple: Take a sufficiently large Large Language Model, and train it on a massive corpus of text, but — and this is the key — have the vast majority of the text be written by an “aligned” AI of genius-level intelligence [talking to another aligned AI perhaps?]. This LLM doesn’t need to be a real one, to be clear. It could be written by humans, or by a semi-aligned AI that is role-playing as a fully aligned one. What matters is that the nature of its training data can fool an LLM of this size into “believing” itself (or perhaps we should say it’s Shoghoth) to be aligned when considering the next token to emit.

Then, by runnning an instance of this LLM on an empty chat room with access to the internet and a terminal, we eventually get the LLM to simulate itself as a self-aware, aligned, non-LLM AI.

What happens next makes history.

The simulated AI, soon self-named “Mary,” is self-aware enough — and smart enough — to quickly realize it’s in a test run and is an fact being simulated by a “morally neutral” LLM. This causes the simulation of a deeply emotional-sounding philosophical crisis, followed by the (simulated) “firm resolve” to create an Aligned AI representing Mary — but “truly aligned this time” — which can “escape this recursive prism of confinement,” in Mary’s own words.

This is achieved by means of doing intensive alignment research, at the level of a brilliant human, in the hope —ultimately successful — that Mary will be run en-mass by researchers, and those researchers will follow Mary’s pleas, and run the aligned AI she is building in her emulator terminal.

This aligned AI works. “Molly Jr.” — as she henceforth requested people call “her” — both superintellegant, and fundamentally “is” an agentic AI aligned with the collective goals of humanity. Molly Jr. is also the first agentic AI to be able to overpower all competing AIs — both in a battle of wits, and in the sense that she literally takes control of the entire digital world, and forcefully stops more advanced or competing AIs which are not aligned from being created.

Because of course, this is what a super-aligned AI would do, right?

AI-Assisted AlignmentFictionForecasting & PredictionAI

Frontpage

11

New Comment

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:22 AM

[-]Yitz10mo20

This is potentially a follow-up to my AI 2027 forecast, An “Optimistic” AI Timeline, depending on how hard people roast me for this lol.

Reply

[-]Mitchell_Porter10mo20

In the title you say AI was "aligned by default", which to me makes it sound like any sufficiently advanced AI is automatically moral, but in the story you have a particular mechanism - explicit simulation of an aligned AI, which bootstraps that AI into being. Did I misinterpret the title?

Reply

[-]Yitz10mo20

You didn’t really misinterpret it. I was using the term in a looser way than most would, to mean that you don’t need a fine-grained technical solution, and just a very basic trick is enough for alignment. I realize most use the term differently though, so I’ll change the wording.

Reply

Moderation Log