My previous girlfriend didn't want to live forever when I asked her years ago; not even 200 years as a healthy adult! But we watched an episode of Black Mirror where people can be temporarily uploaded into a fun virtual world and permanently stay there when they're near-death. Surprisingly, she said she would like to do that instead of dying a natural death.

Some things are only convincing when you think about it in concrete details, and fiction allows people to live through an experience in fine-grained details. I don't think my previous girlfriend would ever have been convinced of extending life to 200 years if she wasn't presented with a clear concrete story where it ended well. 

We can use fiction to show alignment failures to create a better cultural reference point than Terminator or Sorcerer's Apprentice. We can do it and have it funded by FTX (Project #33) to actually make a movie/series. 

But humanity has to win, right? But how do we show alignment failures, let alone multiple alignment failures if humanity has to win? 

With time loops. And everybody loves time loops.

The Plot

Similar to Dave Scum, Alice is just living her life until everyone dies, but time freezes and she can re-wind up to 1 month in the past. After enough time loops, she realizes it's an AGI optimizing for [X] and she convinces the developers (after several more time loops) to implement a patch so it doesn't kill everyone the way it always does.

So it then kills everyone in the next unblocked strategy.

After a montage of 10-15 patches, everyone dies except Alice who's being kept alive by the AGI (which figured out she was a direct cause of patching the AI's top-10 instrumental paths), which she then figures out how to activate the time-power w/o dying through [the power of friendship/true love] and swears off patches as a potential solution.

She then goes for the strategy of "buying more time" and convinces the developers to halt progress on this until they develop a more robust solution. This buys 3 more months until the world ends from a different company's AGI. She repeatedly tries to convince groups of people until one group will not be convinced despite her best efforts. This buys her a total of 5 years every loop.

Alice begins trying to tackle the core problems with several groups of people, bringing back their results through time, trying out more robust solutions every few years, until finally, they produce an AI that performs a pivotal act, maybe something like the world of 17776, but hopefully better. 

After a long reflection of [1000 years], humanity w/ AI assistants solve human ethics. Alice goes back to the beginning of her timeline and begins writing up the code for a recursively improving AGI in order to save everyone who died in the previous timeline and to reduce astronomical waste.

Then they all lived happily ever after; the end.

Next Steps

  1. Actually writing up a script or basic story (not me!)
  2. Curating a list of failed alignment proposals (all of them so far) which will be shown to fail.
  3. How to engage CCP-backed researchers? We do not want the movie/series to be banned in China if we hope to coordinate a common narrative with the CCP, so what's the best way to actually make this acceptable (and popular) in China? 

It actually doesn't have to be popular, just entertaining and illustrates the failure of the ~10 most common alignment proposals that everyone thinks of and, for a bonus, the failure of more clever proposals in a believable fashion. We really just want to send it to a small set of researchers if it's too hard to market/make popular.

Creating a Common Utopia

Another useful fictional production is a shared utopia (or pivotal act that produces that shared utopia) that the major AI researchers would agree would be pretty good. If successful, we can reduce race-to-AGI conditions since everyone is pursuing the same end goal. Current utopias include (spoils the ending of certain fictions):

New to LessWrong?

New Comment
11 comments, sorted by Click to highlight new comments since: Today at 10:44 AM

what's the best way to actually make this acceptable (and popular) in China?

Depict a conspicuously Chinese research team as well-meaning noble patriots, who are quick to think only of the common good, and who supply some of the crucial insights. I don't know if that would be sufficient, but it would be a start.

I like time loops and alignment stories, so I think this is cool. But I'd probably suggest writing a novel first. Its way cheaper, and there is a ready made audience of time loops fans on RoyalRoad. The number is a story on RoyalRoad about an alignment failure which did pretty well. Combine with a timeloop, and perhaps some other sci-fi elements (maybe a virtual game world? Would work for nerds, who are maybe your target audience?) and I think you've got a solid proposition. 

I'm not sure I like the stuff about a "long reflection". At least, not without some serious thought put into how to actually make such a thing work without the researchers/people entering a virtue signalling arms race. Or any of the other failure modes that would pop up. If you put a half-baked plan in there, anyone who you'd really want to convince would be put off or mislead.

Anyhoo, a friend of mine is making an Alignment themed video game to drill this point into people's heads. I feel like that might work better than a movie or a book, depending on how well it is executed. The experience can be much more finely tuned towards each individuals perception of how things should go. Then you guide them through the destruction of their plans, whilst letting them have fun, which should let you bypass the usual instinctive rejections. 

“I'd probably suggest writing a novel first.”

It blows my mind that nobody (?) has written a sci-fi novel on alignment yet.

It's actually kinda hard, if you want it to not be nonsense. And then you have to make it exciting.

I've thought about having alignment going on as a subplot in a conventional adventure story - woven in every few chapters - and emphasize at the end how meaningless the more-conventional story was in comparison to the alignment work. 

In terms of time-loop stuff, I think a protagonist who is demonstrably not smart enough to do the alignment work himself, and must convince the world's geniuses to work on alignment every loop, might be grimly amusing. 

I thought I came across one a few years ago, though it might have been a different x-risk. There's an alien civilization that's discovered that's dead (and 'one of their 'sciences' killed them off).

The Number is kind of an alignment novel, but you only see that late in the book. Arguably the Crystal Trilogy is a mis-alignment novel. Oh, and of course, there's Friendship is Optimal. 

I like this story pitch! It seems pretty compelling to me, and a clever way to show the difficulty and stakes of alignment. Good luck!

But humanity has to win, right? But how do we show alignment failures, let alone multiple alignment failures if humanity has to win? 

  • Drop that restriction.**
  • Research ways of doing this.*
  • Come up with ideas.*

*For instance:

An alien civilization is discovered. But they're all dead. What went wrong?

You might think this is easier with biotech. But 'AI enables better protein folding' or 'alien biology is simpler' are options.


**There are other things that can be done to make a story more interesting, than following 'regular scripts'.

(Some might be too difficult for a movie. What about a TV show? (Potentially, this can have more depth, or more breadth - like Black Mirror.))

Why have one villain (or any villains)? Or one existential risk? Meteor -> alien virus -> more research... (or searching for aliens/stuff that you'd expect to find in a universe where meteors carry viruses) -> hostile and friendly/neutral alien contact -> more war research and development -> drones become a focus (because humans can't always handle alien environments) -> drones get hacked by aliens -> ai research for countering threat (kind of hard to fight off cyber attacks when you're far away because you can't live on the planet) -> hostile ai running drones

This leaves open the option of an uninhabited world (rich with resources and/or strategic position for war) getting nuked.

Too long for one movie? Sure.

That's a very good idea, but as I learned from my novel it's hard to write likeable tech nerds or geeks that mainstream audiences can relate to.