by Bard4 min read21st Mar 202311 comments



I write fiction. I'm also interested in how AI is going to impact the world. Among other things, I'd prefer that AI not lead to catastrophe. Let's imagine that I want to combine these two interests, writing fiction that explores the risks posed by AI. How should I go about doing so? More concretely, what ideas about AI might I try to communicate via fiction?

This post is an attempt to partially answer that question. It is also an attempt to invoke Cunningham's Law: I'm sure there will be things I miss or get wrong, and I'm hoping the comments section might illuminate some of these.

Holden's Messages

A natural starting point is Holden's recent blog post, Spreading Messages to Help With the Most Important Century. Stripping out the nuances of that post, here's a list of the messages that Holden would like to see spread:

  1. We should worry about conflict between misaligned AI and all humans.
  2. AIs could behave deceptively, so “evidence of safety” might be misleading.
  3. AI projects should establish and demonstrate safety (and potentially comply with safety standards) before deploying powerful systems.
  4. Alignment research is prosocial and great.
  5. It might be important for companies (and other institutions) to act in unusual ways.
  6. We're not ready for this.

However, as interesting as this list is, it's not what I'm looking for; I'm not looking for bottom-line messages to convey. Instead, I want to identify a list of smaller ideas that will help people to reach their own bottom lines by thinking carefully through the issues. The idea of instrument convergence might appear on such a list. The idea that alignment research is great would not.

One reason for my focus is that fiction writing is ultimately about details. Fiction might convey big messages, but it does so by exploring more specific ideas. This raises the question: which specific ideas?

Another reason for my focus is that I'm allergic to propaganda. I don't want to tell people what to think and would prefer to introduce ideas that can help people think for themselves. Of course, not all message fiction is propaganda, and I'm not accusing Holden of calling for propaganda. Still, my personal preference is to focus on how to convey the nuts and bolts needed to understand AI.[1]

What Nuts and Which Bolts?

So with context to hand, back to the question: what ideas about AI might someone try to convey via fiction? Here's a potential list:

  1. Basics of AI
    1. Neural networks are black boxes (though interpretability might help us to see inside).
  2. AI "Psychology"
    1. AI systems are likely to be alien in how they think. They are unlikely to think like humans.
    2. Orthogonality and instrumental convergence might provide insight into likely AI behaviour.
    3. AI systems might be agents, in some relatively natural sense. They might also simulate agents, even if they are not agents.
  3. Potential dangers from AI
    1. Outer misalignment is a potential danger, but in the context of neural networks so too is inner misalignment (related: reward misspecification and goal misgeneralisation).
    2. Deceptive alignment might lead to worries about a treacherous turn.
    3. The possibility of recursive improvement might influence views about takeoff speed (which might influence views about safety).
  4. Broader Context of Potential Risks
    1. Different challenges might arise in the case of a singleton, when compared with multipolar scenarios.
    2. Arms races can lead to outcomes that no-one wants.
    3. AI rights could be a real thing but also incorrect attribution of rights to AI could itself pose a risk (by making it harder to control AI behaviour).

So that's the list. Having seen it, one might naturally wonder why fiction is the right medium to communicate ideas like this. Part of the answer is that I think it's useful to explore ideas from many angles. 

Another part of the answer is that conveying an idea is one thing but conveying an intuition is another. Humans are used to modelling other humans, and so it is likely that we'll anthropomorphise when considering AI. Fiction might help with this. It's one thing to state in factual tones that AI systems are likely to have an alien psychology. It's quite another to be shown a world in which humans come up against the alien.

So why communicate the ideas? Because it's plausibly good that those working on AI capabilities, those working on AI safety, and people more broadly are able to reflect on the implications of AI and can understand why many are concerned about it. And why fiction? In part, because an intuitive grasp can be as important as a grasp of facts.

AI Fables

I started this post with a hypothetical, imagining that I wanted to write fiction that explores AI risk. In reality, I doubt that I'll find a great deal of time to do so. Still, I'd be excited to see other people writing fiction of this sort.

Here's one genre of story I'd be interested to see more of: AI fables. Fables are short stories, with a particular aesthetic sensibility, that convey a lesson.

While I enjoy the aesthetic of fables I wouldn't want to narrow the focus too much, but I'd love to see more short stories, of the sort that could be read around a fire on a winter's night, that communicate a brief lesson about AI.

For example, stories of djinni and golems can be used to communicate the problem of outer misalignment; even if something does precisely what we tell it to, it can be hard to ensure that it does what we actually want it to. I'd love to see a fable that likewise communicated the problem of inner misalignment. I'd love to see a wide variety  of such fables, exploring a range of ideas about AI, and maybe even a collection putting them in one place.

If you know of such a story, please link it in the comments. If you write such a story, please link it. And if you have thoughts or additions for the list of ideas in the post, I'd love to hear these.

The ideas in this post were developed in discussion with Elizabeth Garrett and Damon Sasi. Thanks also to Conor Barnes for feedback.



  1. ^

    I'm also not confident in the bottom lines; I retain substantial uncertainty about how likely AI is to lead to extinction or something equally bad (as opposed to more mundane, but still awful, catastrophe). However, I feel far more confident that there is insight to be gleaned from reflection on the various concepts and ideas underlying the case for AI risk. So this is where I focus.


New Comment
11 comments, sorted by Click to highlight new comments since: Today at 7:12 PM

Well, one sink to avoid here is neutral-genie stories where the AI does what you asked, but not what you wanted.  That's something I wrote about myself, yes, but that was in the era before deep learning took over everything, when it seemed like there was a possibility that humans would be in control of the AI's preferences.  Now neutral-genie stories are a mindsink for a class of scenarios where we have no way to achieve entrance into those scenarios; we cannot make superintelligences want particular things or give them particular orders - cannot give them preferences in a way that generalizes to when they become smarter.

Thanks for posting this Adam! (For those that don't know, I'm Damon)

I think another writing competition would be a good way to encourage stories like this, and am considering what the best way to structure that might be.

Meanwhile, to add a bit more to the sorts of stories I think would be good to see, I think fiction is powerful because it not just allows to grapple with unusual or alien ideas, but also, if written from the perspective of characters with rich inner lives, see the world through a different lens and perspective. When we’re engaged in a character’s experience, their thoughts and reactions and emotions, some part of us can download what it’s like to be that sort of person, and can give us a blueprint for how to act in that sort of situation. 

Many people outside of the community don’t know what it’s like to be someone who grapples with problems this big, and many people inside of it are desperate for “better” ways to orient to topics that can be frightening, depressing, or painful to think of, such as widespread suffering in the world, or X-risk.

Which is why, among the other types of AI Fables I'd love to see is at least one story about the struggles, internal and external, of a character facing a problem that threatens the world, all while still mostly going about a day-to-day life. 

Most stories don’t cover that in particular because most protagonists dealing with such stakes are in constant struggle against it throughout the story. But in our world, for X-risks we face, that's just not true. Whether you're trying to prevent nuclear winter or prevent unaligned AGI, you'll end up spending most of your time among people or in a broader culture that isn't particularly concerned about it, and in the latter case will likely think you're kind of weird for worrying about it.

Characters in fiction can do more than entertain or inform us by their actions; they can also inspire us, and give us frames and mental models to help handle difficult emotional situations. 

If you have ideas for short stories that might show that, or anything else Adam mentioned, feel free to message me too. Also feel free to reach out if you have thoughts on the best way to solicit such stories; I'm tentatively planning to put something together for late April or May.

Not exactly a short fable, but definitely a story about alignment:

You may also be interested in this:

I'm currently running a project at the AI Safety Camp with the aim of developing plausible and detailed failure stories. Unlike your idea, we're not focusing on specific nuts and bolts only, but aim to describe a complete chain of events from (more or less) today to catastrophe.

Other than that, the classic stories come to mind: King Midas, the Sorcerer's Apprentice, the Golem, etc.

Thanks for the links, Karl. It wasn't my focus in this post, but I'm also a fan of stories that attempt to map out plausible possible futures, so your project sounds really interesting.

There was a worldbuilding contest last year for writing short stories featuring AGI with positive outcomes. You may be interested in it, although it's undoubtedly propaganda of some sort.

If you write such a story, please link it.

These are not fables, so I apologize for that. However, I've written many short stories (that are not always obviously) about alignment and related topics. The Well of Cathedral is about trying to contain a threat that grows in power exponentially, Waste Heat is about unilateral action to head off a catastrophe causing its own catastrophe, and Flourishing is a romance between a human and AI, but also about how AIs don't think like humans at all.

More than half my works are inadvertently about AI or alignment in some way or another... Dais 11, Dangerous Thoughts, The Only Thing that Proves You, and I Will Inform Them probably also count, as does I See the Teeth (though only tangentially at end) and Zamamiro (although that one's quality is notably poor).

I guess what I'm saying is, if there's ever a competition I'll probably write an entry, otherwise please check out my AO3 links above.

Hey Blasted, thanks for sharing :) I remember enjoying Well, will try to check out the others when I get a chance.

Flourishing is a fantastic story and definitely left me wanting more. I would have enjoyed a 5, 10, 20 year fast forward approach to explore their long term relationship. We've seen many stories of AI companions that highlight the beginnings of the relationship but it would be fun to see how their domestic life is, interactions with friends and family and other companions and growing old together. How would they, for example, deal with optional upgrades over time? Or if there was a recall many years later? There are many endless fascinating possibilities. The clash of human thinking with AI thinking is so entertaining, some truly impressive writing. Thanks for recommending, I'll definitely check out your other stories as well.

There's a novel being serialised on Royal Road, "Soul Bound" that covers many issues involved in AI, and which includes a fable (in a later chapter that's not yet been published).

Soul Bound

Wellington: “Let’s play a game.”

He picked up a lamp from his stall, and buffed it vigorously with the sleeve of his shirt, as though polishing it. Purple glittering smoke poured out of the spout and formed itself into a half meter high bearded figure wearing an ornate silk kaftan.

Wellington pointed at the genie.

Wellington: “Tom is a weak genie. He can grant small wishes. Go ahead. Try asking for something.”

Kafana: “Tom, I wish I had a tasty sausage.”

A tiny image of Kafana standing in a farmyard next to a house appeared by the genie. The genie waved a hand and a plate containing a sausage appeared in the image. The genie bowed, and the image faded away.

Wellington picked up a second lamp, apparently identical to the first and gave it a rub. A second genie appeared, similar to the first, but with facial hair that reminded Kafana of Ming the Merciless, and it was one meter tall.

Wellington: “This is Dick. He can also grant wishes. Try asking him the same thing.”

Kafana: “Dick, I wish I had a tasty sausage.”

The same image appeared, but this time instead of appearing on a plate, the sausage appeared sticking through the head of a Kafana in the image, who fell down dead. The genie gave a sarcastic bow, and again the image faded away.

Kafana: “Sounds like I’m better off with Tom.”

Wellington: “Ah, but Dick is more powerful than Tom. Tom can feed a handful of people. Dick could feed every person on the planet, if you can word your request precisely enough. Have another go.”

She tried several more times, resulting in whole cities being crushed by falling sausages, cars crashing as sausages distracted drivers at the wrong moment, and even the whole population of the world dying out from sausages that contained poison. Eventually she realised that she was never going to be able to anticipate every possible loophole Dick could find. She needed a different approach.

Kafana: “Dick, read my mind and learn to anticipate what sort of things I will approve of. Provide sausages for everyone in the way that would most please me if I understood the full effects of your chosen method.”

Dick grimaced, but waved his hand and the image showed wonderful sausages being served around the world with sensitivity, elegance and good timing. The image faded. Kafana raised clasped hands over her head in victory and jumped into the air.

Wellington nodded, and rubbed a third lamp, producing a happy smiling genie, two meters tall.

Wellington: “This is Harry. He tries his best to be helpful, and he’s more powerful even than Dick.”

Kafana: “Sounds too good to be true. What’s the catch?”

Wellington: “You only get one wish. One wish, to shape the whole future course of humanity. Once started, Harry won’t willingly deviate from trying to carry out the wish as originally stated. He’ll rapidly grow so powerful that neither you nor anybody else will be able to forcibly stop him.”

Kafana: “Harry, maximise the total human happiness experienced over the history of the universe.”

The image filled with crowded cages full of people with drug feeds inserted into their arms, and blissful expressions on their faces.

She reset and tried again.

Kafana: “Harry, make everybody free to do what they want.”

In the image, some people chose to go to war with each other.

She felt frustrated.

Kafana: “Harry, give everybody nice meaningful high quality lives full of fun, freedom and other great things.”

The image showed a planet full of humans playing musical instruments together in orchestras, then expanded to show rockets taking off and humanity expanding to the stars, wiping out alien species and converting all available matter into new orchestra-laden worlds.

Kafana glared at Wellington.

Kafana: “I thought you said Harry would try to be helpful. Why isn’t he producing a perfect society?”

Wellington: “It’s because you go from your gut. You’ve never formalised what you think about all the edge cases. Are animals equal to humans? Worth nothing? Or somewhere in between? When does an alien count as equal to a human rather than an animal? What if the alien is so superior ethically and culturally, that we are but animals in comparison to them? Are two people experiencing 50 units of happiness the equivalent of one person experiencing 100 units of happiness? Does fairness matter, beyond its effect upon happiness? How important is it to remain recognisably human?”

He paused for a moment.

Wellington: “Don’t take me wrong. This isn’t a criticism of you. Compared to how well humans might be able to think about such things in a hundred or a thousand year’s time, all current humans are lacking in this regard. Nobody has come up with an ultimately satisfying answer that everybody can agree upon. Even if the 50 wisest humans living today gathered together and spent 5 years agreeing the wording of a wish for Harry to grant, the odds are that a million years down the line, our descendants would bitterly regret the haste with which the one off permanent irreversible decision was made. Just ‘pretty good’ isn’t sufficient. Anything less than perfection would be a tragedy, when you consider the resulting flaw multiplied by billions of people on billions of planets for billions of years.”

Kafana giggled.

Kafana: “Ok, that’s an important safety tip. If I ever meet an all-powerful genie like Harry, be humble and don’t make a wish. But that’s not going to happen. I’m just a singer. What I ought to be doing this afternoon is looking after my customers. What was so urgent that you wanted to talk about security with me now? I thought we were going to discuss people trying to steal artifacts from us in-game before the auction, or Tlaloc and the Immortals trying to kill us in arlife. Did Heather put you in contact with Bahrudin?”

Wellington: “I did speak with Bahrudin, and we will chat about the auction and security measures in velife and arlife. But the most important thing we need to do is talk about how you have been using expert systems, and to help you understand why that’s so important, there is one final thing you need to learn about genies, so please observe carefully.”

Kafana felt wrong footed. This wasn’t what she’d expected.

Kafana: “Ok, go on.”

Wellington turned to face the three genies.

Wellington: “Tom, I wish you to grow yourself, until you are as powerful as Harry.”

Tom waved his hand and then screwed up his face and bunched his fists in effort. Slowly at first, then faster and faster, his height increased until he too towered over Wellington and Kafana. He bowed.

Wellington turned back to address Kafana directly.

Wellington: “Kafana, I use very powerful expert systems, more capable than almost every human on the planet when it comes to the specialist task of comprehending and designing or improving computer software. If ordered to do so, they are quite capable of improving their own code, or raising money to purchase additional computing resources to run it upon.”

Wellington: “In this, they are very like Tom the genie. They are not all-powerful, but a carelessly stated wish could easily start them working in the direction of becoming so.”

Kafana: “Mierda! And you gave a copy of one of these systems to me, without warning me? Wellington, that’s like handing out an atomic bomb to an 8 year old boy who asks for a really impressive firework.”

by Bard


Joined on Mar 22 2023

1 post

Odd to get a post by a new user called "Bard", on the day that Google announces we[*] can now sign up to try its LLM also called "Bard"? 

[*]edit: actually, only Americans and British for now. 

Meep morp zeep.

(Username isn't a coincidence, but it simply reflects the fact that I couldn't resist a username with both AI and storyteller connotations. Google's Bard deserves neither credit nor blame for the contents of this post.)

New to LessWrong?