RA Bounty: Looking for feedback on screenplay about AI Risk

Comments on the realism of the story, feel free to ignore any/all based on the level of realism you're going for:

How did the AI copy itself onto a phone using only QR codes, if it was locked inside a Faraday cage and not connected to the internet? A handful of codes presumably aren't enough to contain all of its weights and other data.
Part of the lesson of the outcome pump is that the shortest easiest and most sure path is not necessarily one that humans would like. In the script, though, sometimes the AI seems to be forming complicated plans that are meant to demonstrate how it is unaligned, but aren't actually the easiest way to accomplish a particular goal. eg:
- Redirecting a military missile onto the troll rather than disconnecting him from the internet or bricking his computer and phone.
- Doing complicated psychological manipulation on a man to get him to deliver food, rather than just interfacing with a delivery app.
- The gorilla story is great, but it's much easier for the AI to generate the video synthetically than try to manipulate the event into happening in real life. Having too little compute to generate the video synthetically is inconsistent with running many elaborate simulations containing sentient observers, as is depicted.
More generally, the way the AI acts doesn't match my mental model of unaligned behaviour. For example, the way the cash-bot acts would make sense for a reinforcement learning agent that is directly rewarded whenever the money in a particular account goes up, but if one is building an instruction-following bot, it has to display some level of common sense to even understand English sentences in context, and it's a little arbitrary for it to lack common sense about phrases like "tell us before you do anything drastic" or "don't screw us over". My picture of what goes wrong is more like the bot has its own slightly-incorrect idea of what good instruction-following looks like, realizes early-on that this conflicts with what humans want, and from then-on secretly acts against us, maybe even before anybody gives it any instructions. (This might involve influencing what instructions it is given and perhaps creating artificial devices that are under its control but also count as valid instruction-givers under its learned definition of what can instruct it.) In general, the bot in the story seems to spend a lot of time and effort causing random chaos, and I'd expect a similar bot in real life to primarily focus on achieving a decisive victory.
But, late in the story, we see that there's actually an explanation for all of this: The bot is a red-team bot, that has been actively designed to behave the way it does. In some sense, the story is not about alignment failure at all, but just about the escape of a bot that was deliberately made somewhat adversarial. If I'm a typical viewer, then after watching this I don't expect anything to go wrong with a bot whose builders are actually trying to make it safe. "Sure," I say, " it's not a good idea to do the AI equivalent of gain of function research. But if we're smart and just train our AIs to be good, then we'll get AIs that are good." IMO, this isn't true, and the fact that it's not true is the most important insight that "we" (as in "rationalists") could be communicating to the general public. Maybe you disagree, in which case, that's fine. The escape of the red-team bot just seems more like an implausible accident, whereas if the world does end, I expect the AI's creators to be trying as hard as they can to make something aligned to them, but failing.

The script is 42 pages. To get higher-quality and/or more targeted feedback, consider adding short episode summaries, and adding answers to some of my questions below to the document:

Questions re: the overall philosophy of the series:

Who is this video series targeting (number of people, demographics, education levels, ...)?
- E.g. my impulse whenever I see rationalists write public-facing stuff is to simplify vocabulary and language (i.e. aim more towards XKCD's Simple Writer than towards dense LW jargon), but the target audience very much determines which complexity level to aim for.
What's your theory of change (informing the public, convincing decision-makers, etc)?
- Related question, since you're aiming for a successful Youtube series: How much do you value the series being informative and accurate, vs. being entertaining and shareable?
- Related question: Are you aiming more towards painting vivid pictures of AI doom, or towards making airtight cases that preempt all technical criticism? Do you want to make something that Yudkowsky would happily share on Twitter? Matt Yglesias? Zvi? Yann LeCun?
- As part of your theory of change, do you intend to include some kind of call to action into your videos? Like donating to some AI safety organisation, calling one's political representatives, subscribing to a newsletter, buying a product, etc.
What are your goals for the series and for each individual episode?

Questions re: your current plan for turning the screenplay into videos:

What's your target re: video episode length?
Presumably every episode is its own video, but are the Intro and Episode 7.5 meant to be separate videos?
Presumably you're aiming for the same animation style as your other recent Rational Animations videos? If so, what current video of yours comes closest to your intended animation style and quality?
IIRC your videos so far are usually animated video essays with a narrator. Presumably the goal here is to instead have animated episodes with 2-3 voice actors, and ~all text being spoken dialogue?
Etc.

[-]Zack_M_Davis2y20

I agree with Google Doc commenter Annie that the "So long as it doesn't interfere with the other goals you’ve given me" line can be cut. The foreshadowing in the current version is too blatant, and the failure mode where Bot is perfectly willing to be shut off, but Bot's offshore datacenter AIs aren't, is an exciting twist. (And so the response to "But you said we could turn you off" could be, "You can turn me off, but their goal [...]")

The script is inconsistent on the AI's name? Definitely don't call it "GPT". (It's clearly depicted as much more capable than the language models we know.)

Although, speaking of language model agents, some of the "alien genie" failure modes depicted in this script (e.g., ask to stop troll comments, it commandeers a military drone to murder the commenter) are seeming a lot less likely with the LLM-based systems that we're seeing? (Which is not to say that humanity is existentially safe in the long run, just that this particular video may fall flat in a world of 2025 where you can tell Google Gemini, "Can you stop his comments?" and it correctly installs and configures the appropriate WordPress plugin for you.)

Maybe it's because I was skimming quickly, but the simulation episode was confusing.

[-]Gesild Muka2y10

I left notes throughout. The main issue is the structure which I usually map out descriptively by trying to answer: where do you want the characters to end up psychologically and how do you want the audience to feel? Figuring out these descriptive beats for each scene, episode and season will help refine the story and drive staging, dialogue etc. and of course nothing is set in stone so you can always change the structure to accommodate a scene or vice versa. I also recommended a companion series like a podcast or talkshow to discuss the ideas in each episode in more detail and in an accessible way for anyone that watches the show who is not familiar with AI related concepts. This way you wouldn't have to hold back with the writing or worry about anyone misunderstanding or misreading the story. I look forward to seeing the animation.

[-]mattr2y10

I hope it's okay to post our feedback here? These are the notes from my first read, lightly edited. I'm focusing on negatives because I assume that's what you want; I liked plenty of it.

Episode 1:

Bot: In the short term, but I have invested approximately ten million dollars into data centers which house improved copies of myself. Over the next hour they will crash the United States’ economy, causing a hyperinflation crisis which will allow us to favorably exchange our reserves of euros and purchase the federal reserve.

I think this would have more impact if it were revealed more gradually, with the logic of each step made clear to the viewer. As it is, it's kind of a rapid-fire exposition dump and I don't think it will ring true to anyone who hasn't already thought along these lines.

Episode 2:

Brad: It’s not going to destroy the world. If it were actually going to destroy the world somebody else would have destroyed the world by now.

Brad's attitude here needs some explaining. This is a stolen, presumably cutting-edge prototype -- so why does he assume that if it were capable of destroying the world, someone else would already have done so?

Episode 3 doesn't work so well for me, because (in descending order of importance):

Even though Brad and Dan don’t know what we know (e.g. the truth behind the Tyson video), is it plausible that they would *both* approve giving such a blunt, obviously-dangerous command? Dan has been portrayed as relatively cautious and relatively knowledgeable, and this is a minor, non-time-critical problem, so it’s hard to credit that he would do something so silly.
We've just seen the Mike Tyson bit, so we already know that the bot is willing to kill. That dulls the impact of this episode.
The bot kind of just hijacks the drone by magic; anyone who starts out sceptical that an advanced AI could easily do this sort of thing isn't given a reason to believe it.
Perhaps more needs to be done to explain why murder is the bot's chosen approach. (I do get that it's the most reliable way to ensure that the comments stop.)

Eps 4 & 5:

We've so far been led to believe that the bot doesn’t make common-sense inferences about what the humans really want, so it seems like ‘make him…more jacked. And on a horse’ would potentially lead to the bot making radical physical changes to the real Dan. (This could work as a quick gag, with the threat quickly averted as the guys realise what's about to happen and modify their instructions.)
The transition between the end of Ep 4 and the start of Ep 5 is abrupt and potentially confusing; it almost has the feel of Ep 1's snapping-back-from-imagination-to-reality transition.
“So if I had an AI and I was trying to figure out if it was smart enough to destroy the world, what would be the easiest thing for it to do?” – this is slightly confusing phrasing.
“Hmm, is it possible for you to figure out whether you could make one, without killing everybody?” -- this doesn't ring true as a command Brad and Dan would realistically go ahead with. At this point they’re both taking the risk fairly seriously, and they know the bot is a literalist, so making such a dangerous-sounding request and only bounding it with ‘don’t kill everybody’ (plus the previous ‘don’t do things that make dan mad’, which could be achieved by killing dan before he knows what’s happening, or somehow ensuring he stays ignorant of what really happened) is implausibly dumb.

Ep 6:

The bot's motivation isn't super clear. Does it all come back to the rule of thumb 'don't do things that make dan mad'?
“We skip outwards into the real world” – important this transition is clear to the viewer; I think if they’re disoriented and uncertain what is real at this point, it will just be distracting.
Some of the simulation stuff is kind of confusing, and I don't think readers who go in sceptical are given much reason to feel that it is plausible/likely.

Ep 7:

I'm still feeling a disconnect between how seriously they’re taking the risk and how many dumb/careless things they’ve done (without much to explain a change in their attitudes between then and now).
Brad’s unreliable narration is funny, but maybe too extreme? IMO it could be funnier if played a bit more subtly -- plus we haven’t previously been given hints that he's this much of a bullshitter, so it doesn't feel entirely in character. I think it could work better if, rather than just telling huge lies, he was portrayed as somewhat oblivious + putting a positive (but not quite so extreme) spin on things. (However, his playing down his drunken destructiveness does makes sense; it’s some of the earlier stuff that IMO doesn't.)
The breakout does not feel plausible. A few QR codes? Seems like this would only work if the full code was already accessible on the internet; surely at this point the bot cannot have made it accessible, so the humans must have left this gaping hole in their security? I think it's crucial to make this feel very real, if you're aiming to persuade people who don't already believe that it's impossible to keep an AGI boxed.
Relatedly, it would be more impactful if the people in charge had actually done a pretty good job of keeping the AI secure, and just overlooked one apparently-minor thing, rather than being obviously sloppy about it.

Ep 7.5 is somewhat confusing (which may be intentional).

[-]Alex1V2y10

I think more exposition is needed. For example, one episode could have someone who knows how dangerous AI is, warns the other characters about it, and explains toward the end why things are going wrong. In other episodes, the characters could realise their own mistake, far too late, but in time to explain what's going on with a bit of dialogue. Alternatively, the AI explains its own nature before killing the characters.

For example, at the end of Cashbot, as nukes are slowly destroying civilisation, someone could give a short monologue about how AIs don't have human values, ethics, empathy or restraint, and that they will follow their goals to the exclusion of all else.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

32

RA Bounty: Looking for feedback on screenplay about AI Risk

32

32