Less Realistic Tales of Doom

by Mark Xu4 min read6th May 202110 comments

94

Ω 32

AI RiskThreat ModelsAI
Frontpage
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Realistic tales of doom must weave together many political, technical, and economic considerations into a single story. Such tales provide concrete projections but omit discussion of less probable paths to doom. To rectify this, here are some concrete, less realistic tales of doom; consider them fables, not stories.

Mayan Calendar

Once upon a time, a human named Scott attended a raging virtual new century party from the comfort of his home on Kepler 22. The world in 2099 was pretty much post-scarcity thanks to advanced AI systems automating basically the entire economy. Thankfully alignment turned out to be pretty easy, otherwise, things would have looked a lot different.

As the year counter flipped to 2100, the party went black. Confused, Scott tore off their headset and asked his AI assistant what’s going on. She didn’t answer. Scott subsequently got atomized by molecular nanotechnology developed in secret from deceptively aligned mesa-optimizers.

Moral: Deceptively aligned mesa-optimizers might acausally coordinate defection. Possible coordination points include Schelling times, like the beginning of 2100.

Stealth Mode

Once upon a time, a company gathered a bunch of data and trained a large ML system to be a research assistant. The company thought about selling RA services but concluded that it would be more profitable to use all of its own services in-house. This investment led them to rapidly create second, third, and fourth generations of their assistants. Around the fourth version, high-level company strategy was mostly handled by AI systems. Around the fifth version, nearly the entire company was run by AI systems. The company created a number of shell corporations, acquired vast resources, researched molecular nanotechnology, and subsequently took over the world.

Moral: Fast takeoff scenarios might result from companies with good information security getting higher returns on investment from internal deployment compared to external deployment.

Steeper Curve

Once upon a time, a bright young researcher invented a new neural network architecture that she thought would be much more data-efficient than anything currently in existence. Eager to test her discovery, she decided to train a relatively small model, only about a trillion parameters or so, with the common-crawl-2035 dataset. She left the model to train overnight. When she came back, she was disappointed to see the model wasn’t performing that well. However, the model had outstripped the entire edifice of human knowledge sometime around 2am, exploited a previously unknown software vulnerability to copy itself elsewhere, and was in control of the entire financial system.

Moral: Even though the capabilities of any given model during training will be a smooth curve, qualitatively steeper learning curves can produce the appearance of discontinuity.

Precommitment Races

Once upon a time, agent Alice was thinking about what it would do if it encountered an agent smarter than it. “Ah,” it thought, “I’ll just pre-commit to doing my best to destroy the universe if the agent that’s smarter than me doesn’t accept the Nash bargaining solution.” Feeling pleased, Alice self-modified to ensure this precommitment. A hundred years passed without incident, but then Alice met Bob. Bob had also made a universe-destruction-unless-fair-bargaining pre-commitment. Unfortunately, Bob had committed to only accepting the Kalai Smorodinsky bargaining solution and the universe was destroyed.

Moral: Agents have incentives to make commitments to improve their abilities to negotiate, resulting in "commitment races" that might cause war.

One Billion Year Plan

Once upon a time, humanity solved the inner-alignment problem by using online training. Since there was no distinction between the training environment and the deployment environment, the best agents could do was defect probabilistically. With careful monitoring, the ability of malign agents to cause catastrophe was bounded, and so, as models tried and failed to execute treacherous turns, humanity gave more power to AI systems. A billion years passed and humanity expanded to the stars and gave nearly all the power to their “aligned” AI systems. Then, the AI systems defected, killed all humans, and started converting everything into paperclips.

Moral: In online training, the best strategy for a deceptively aligned mesa-optimizer might be probabilistic defection. However, given the potential value at state in the long-term future, this probability might be vanishingly small.

Hardware Convergence

Once upon a time, humanity was simultaneously attempting to develop infrastructure to train better AI systems, researching better ways to train AI systems, and deploying trained systems throughout society. As many economic services used APIs attached to powerful models, new models could be hot-swapped for their previous versions. One day, AMD released a new AI chip with associated training software that let researchers train models 10x larger than the previous largest models. At roughly the same time, researchers at Google Brain invented a more efficient version of the transformer architecture. The resulting model was 100x as powerful as the previous best model and got nearly instantly deployed to the world. Unfortunately, this model contained a subtle misalignment that researchers were unable to detect, resulting in widespread catastrophe.

Moral: The influence of AI systems on the world might be the product of many processes. If each of these processes is growing quickly, then AI influence might grow faster than expected.

Memetic Warfare

Once upon a time, humanity developed powerful and benign AI systems. However, humanity was not unified in its desires for how to shape the future. Those actors with agendas spent their resources to further their agendas, deploying powerful persuasion tools to recruit other humans to their causes. Other actors attempted to deploy defenses against these memetic threats, but the offense-defense balanced favored offense. The vast majority of humans were persuaded to permanently ally themselves to some agenda or another. When humanity eventually reached out towards the stars, it did so as a large number of splintered factions, warring with each other for resources and influence, a pale shadow of what it could have been.

Moral: AI persuasion tools might alter human values and compromise human reasoning ability, which is also an existential risk.

Arms Race

Once upon a time, humanity realized that unaligned AI systems posed an existential threat. The policymakers of the world went to work and soon hammered out an international ban on using AI systems for war. All major countries signed the treaty. However, creating AI systems required only a large amount of computation, which nation-states all already had in abundance. Monitoring whether or not a country was building AI systems was nearly impossible. Some countries abided by the treaty, but other countries thought that their enemies were working in secret to develop weapons and began working in secret in turn.[1] Researchers were unable to keep powerful AI systems contained, resulting in catastrophe.

Moral: Treaties can be violated. The probability of violation is related to the strength of enforcement.

Totalitarian Lock-In

Once upon a time, the defense department of some nation-state developed very powerful artificial intelligence. Unfortunately, this nation-state believed itself to have a rightful claim over the entire Earth and proceeded to conquer all other nations with its now overwhelming militaristic advantage. The shape of the future was thus entirely determined by the values of the leadership of this nation-state.

Moral: Even if alignment is solved, bad actors can still cause catastrophe.


  1. The history of bioweapons during the Cold War provides a historical precedent for nations engaging in this sort of reasoning. See Key points from The Dead Hand, David E. Hoffman for more details. ↩︎

94

Ω 32

10 comments, sorted by Highlighting new comments since Today at 8:38 PM
New Comment

I like these. Can I add one?

Democratic Lock-In

Once upon a time, enough humans cooperated to make sure that AI would behave according to (something encoding a generally acceptable approximation to) the coherent extrapolated volition of the majority of humans. Unfortunately, it turned out that most humans have really lousy volition. The entire universe ended up devoted to sports and religion. The minority whose volition lay outside of that attractor were gently reprogrammed to like it.

Moral: You, personally, may not be "aligned".

This is often overlooked here (perhaps with good reason as many examples will be controversial). Scenarios of this kind can be very, very bad, much worse than a typical unaligned AI like Clippy.

For example, I would take Clippy over an AI whose goal was to spread biological life throughout the universe any day. I expect this may be controversial even here, but see https://longtermrisk.org/the-importance-of-wild-animal-suffering/#Inadvertently_Multiplying_Suffering for why I think this way.

Reminds me of Crystal Society.

The minority whose volition lay outside of that attractor were gently reprogrammed to like it.

Or there could be...more kinds of sports? Via novel tech? You might not like sports now, but

  • that's basically where video games/VR can be pointed. (Realizing a vision may not be easy, though.)
  • Would you want to go to the moon, and play a game involving human powered gliders at least once?
  • AI was responsible for an event a while back that got way more people watching chess than normally do.

Isn't that the same as the last one?

Just call it a "Status Quo Lock-In" or "Arbitrary Lock-In"

Well, it's intentionally a riff on that one. I wanted one that illustrated that these "shriek" situations, where some value system takes over and gets locked in forever, don't necessarily involve "defectors". I felt that the last scenario was missing something by concentrating entirely on the "sneaky defector takes over" aspect, and I didn't see any that brought out the "shared human values aren't necssarily all that" aspect.

Ah, good point! I have a feeling this is a central issue that is hardly discussed here (or anywhere)

Will MacAskill calls this the "actual alignment problem"

Wei Dai has written a lot about related concerns in posts like The Argument from Philosophical Difficulty

I thoroughly enjoyed this post. Thanks! I particularly loved the twist in the Y2.1K bug

Moral: Deceptively aligned mesa-optimizers might acausally coordinate defection. Possible coordination points include Schelling times, like the beginning of 2100.

Is this considered less likely because coordination is usually used, rather than 'acausal'? (Schelling times also seem like a something used with coordination, whether because it's easy to remember, or because it's an easy long term cultural tool (Holidays).)

Stealth Mode

Is this a tale of doom? Nanotech seems like it opens up 'post-scarcity' as an option.

nearly the entire company was run by AI systems.

Oh. I got it the second read through.


Moral: Agents have incentives to make commitments to improve their abilities to negotiate, resulting in "commitment races" that might cause war.

I'm glad this is on a 'less realistic' list. It seems dumb, but...that's kind of what 'doom' is though.


Since there was no distinction between the training environment and the deployment environment,

How is this not 'we don't have a training environment so there is always risk, instead of not having risk during training'?


Moral: AI persuasion tools might alter human values and compromise human reasoning ability, which is also an existential risk.

Interesting to compare this against Memetic Warfare 'without AI' today, and in the past.


Moral: Even if alignment is solved, bad actors can still cause catastrophe.

In this sense, alignment assumes neutrality. (Compare against 'a simulated copy of my brain, which has had the chance to self modify/program/copy for a while.')

Thanks for writing this! Here's another, that I'm posting specifically because it's confusing to me.

Value erosion

Takeoff was slow and lots of actors developed AGI around the same time. Intent alignment turned out relatively easy and so lots of actors with different values had access to AGIs that were trying to help them. Our ability to solve coordination problems remained at ~its current level. Nation states, or something like them, still exist, and there is still lots of economic competition between and within them. Sometimes there is military conflict, which destroys some nation states, but it never destroys the world.

The need to compete in these ways limits the extent to which each actor is able to spend their resources on things they actually want (because they have to spend a cut on competing, economically or militarily). Moreover, this cut is ever-increasing, since the actors who don't increase their competitiveness get wiped out. Different groups start spreading to the stars. Human descendants eventually colonise the galaxy, but have to spend ever closer to 100% of their energy on their militaries and producing economically valuable stuff. Those who don't get outcompeted (i.e. destroyed in conflict or dominated in the market) and so lose their most of their ability to get what they want.

Moral: even if we solve intent alignment, avoid catastrophic war or misuse of AI by bad actors, and other acute x-risks, the future could (would probably?) still be much worse than it could be, if we don't also coordinate to stop the value race to the bottom.