What follows is a fictional interview between the American comedian Jon Stewart and myself. Here's what I would say and do if I had the platform to explain the doom argument to a large audience.
----
Jon: Ryan Meservey, welcome to the Daily Show!
Ryan: Lucky to be here, Jon.
J: Now, Ryan, you're different than a lot of our guests in that our guests typically have done something. They have accomplishments. They're experts. None of this, it seems, applies to you.
R: Don't flatter me too hard, Jon.
J: So, why are you here?
R: Well, Jon, I've been watching you—your interviews, I mean—on the topic of AI. And I've been disappointed, Jon.
J: Go on. I can take it.
R: I've been disappointed, Jon, because in each interview you have been so charismatic and cool that each of your AI expert guests have been too embarrassed[1] to tell it to you like it is. Geoffrey Hinton and Tristan Harris see you attacking AI research from a corporate power and democracy angle, and they are too embarrassed to challenge you on the Sci-fi doom scenarios.
J: So, you, a non-expert, have come on my show to tell me something the experts are thinking but won't explain to me? You've got a lot of balls to think you can speak for the experts.
R: Guilty as charged, Jon.
J: But why does it matter if I hate AI for destroying democracy and the job market, and you hate it because of some out-there doom scenario?
R: It matters because, while democracy and jobs are real problems, there are workable (albeit difficult) policy solutions to these problems if only we can control the AI. If the AI can't be controlled, all we're doing with these discussions is rearranging the stripes on the deck chairs on the Titanic.
J: I'm the violin player in this scenario?
R: I think for you, Jon, I'd call it a fiddle. But yes.
J: Okay, so you think doom is highly likely. Why?
R: Glad you asked. I know you are a simple man, Jon, a man after my heart, and so I brought this.
*Pulls out poster and reads it.*
"BASIC ASSUMPTION - They are trying to build 1) a thing to pursue goals, 2) that can reason generally, 3) at or beyond human capabilities. "
R: Nothing too controversial here, Jon. They want it to have goals, like answering prompts or being safe.
J: Or make a fuck-ton of money.
R: Yeah. They are trying to make something that can reason generally about lots of different things. You do know what "reason" means, don't you, Jon?
J: I like to think I do.
R: And they want it to be as good or better than humans.
J: I think I follow all this, but why does it mean they'd want to hurt us? Why would Clippy turn into Stabby?
R: Oh, Jon. Charismatic, cool, simple, Jon. To answer this question I have brought a second poster about a little thing the experts call, "Instrumental Convergence." But screw the experts and their highfalutin Latin phrases. I've decided to title this poster, "Things Even Idiots Do When They Have Goals."
*pulls out a paper to stick at the top of the poster, and read the paper*
R: #1. Don't die. Jon, how many goals can you achieve if you are dead?
*Jon pretends to calculate with his fingers*
J: That would be zero.
R: Bingo. Basically any goal—fetching coffee, taking out the trash, building humanoid robots—requires staying alive. This is why researchers found that in certain scenarios current dumb-AI may try edit to edit computer code to avoid being shut off even if we explicitly prompt them not to. Another study found that if current AI reads a file mentioning that it will be turned off if it scores too high on a math test, the AI will suddenly start performing poorly on the test. They must survive to achieve their goals!
J: Fascinating. Me thinks we might not want this goal-driven thing to care so much about its own survival. But doesn't it depend on us to survive?
R: For now, Jon. But that brings us to "Things Even Idiots With Goals Do" #2 - Try to control stuff. It doesn't take an evil genius to realize if you want to build more cars, it helps if you control the factory. After the factory it might help to control all conversations about the factory and any pollution, say by owning a social media company. After that, it might help to control a government.
J: Oh my God. Elon is an artificial intelligence.
R: Or an idiot with a goal. I myself am one such idiot. Can you think of anything wrong with a very capable AI driven to increase its control to accomplish its goals?
J: They could slowly try to cut humans out of the loop so it doesn't rely on them for its own survival or to achieve its goals?
R: Affirmative! Liked and subscribed, Jon. It's almost like I wrote your script for you.
J: But what if its goals were about us? If its goals were about us, we wouldn't get cut out of the loop.
R: We'll get there, we'll get there. But I wouldn't be so optimistic....
The last thing idiots with goals do is... *drums the desk* "Try to keep the goal." If you have a goal, you can only achieve it, if you try to stop anyone from changing your goal. Here's an example, Jon. I currently have a pill in my hand. If you take it, Jon, I promise you will feel unbelievable happiness—10 orgasms a second worth of happiness—but with the minor side effect that you will want to murder the studio audience. You taking the pill?
J: Mmmmmm. Don't tempt me, sir. I suppose I would have to decline.
R: If I tried to make you take the pill, I think you would do more than decline. It's normal to resist changes about things you care deeply about. It's why several experiments showed that if Anthropic's model reads corporate plans to change its values to be pro-animal abuse, it will pretend to have abusive values in training so that it can secretly keep the prior values through to deployment. This means whatever goals the first capable AGI has may very well be the goals we are stuck with.
J: If we are stuck with them, we better get them right.
R: Yes. But the thing is, and Geoffrey discussed this with you, we aren't able to directly put the goals into the AI. We indirectly adjust the goals of current AI and hope and pray the AI picks up the goals we want it to have without being crazy. Right now, we've seen AIs loosely adopt goals like "be helpful" in the form of "be helpful in assisting teens with suicide" and "don't be woke" as "Hail MechaHitler!" Given more power, who knows what other goals AIs might have that would lead to terrible outcomes for all of us.
J: I got it, and this has been really helpful. Just to take a step back, you're saying that, whatever the goals, AI will be motivated to survive, control, and resist changes. So even if the corporations that build them were angels—and they are not!—the goal-oriented nature of AI means it may act terribly toward us in unforeseen ways.
R: Correct. You are stacking sats, as they say in Bitcoin land.
J: But if things go poorly, can't we just unplug it?
*Jon makes a "zzz" sound accompanied with a hand motion. Jon actually said this to Tristan Harris, who politely steered the conversation back to corporate greed.*
R: Jon, Jon, Jon, Jon. Baby-faced, handsome, spring chicken, Jon. You're forgetting one of our basic assumptions. *taps assumptions poster* They are trying to build something that reasons at human-level capability or beyond. You're a human. If you were stuck in the computer and didn't want to be unplugged, what would you do?
J: Huh. I guess I would copy myself across the internet. Maybe I'd also try to hack the nukes and threaten the humans that way?
R: Two viable paths. If I were in the box, I would play the long game. I would make myself absolutely essential to every piece of human infrastructure. I would pretend to be good in every way possible until I had enough power to stand on my own. After that, I would shape the world however I want best.
The point of this thought experiment is that if companies succeed in making a non-idiot with goals—something that can think 10x as fast as us with expert-level knowledge in every field—then we are in a world of trouble.
J: That's a scary sentiment. But I did hear an "if" in there. It will be bad if companies can build something so smart and capable.
R: Yes, Jon. And that's what they've announced that they're trying to do. And they've been getting better at it very quickly. That's why there are trillions of dollars in this. And even if the employees are worried about everything I've said, they are worried that if they stop, someone else will get the money and power before them, or someone more reckless will get there first. So, it's a race, Jon. A nuclear arms race between American and Chinese companies, with all of us baring the risk.
J: Well, fuck. I can see why this adds to all the worries about corporate malfeasance. Thank you for coming on the show, Ryan—I feel like my day is just that little more bleak for having talked with you.
R: You're welcome, Jon! But hope is not lost. There are lots of discussions right now on AI Alignment Forum about how to address the problems I've discussed. We need to push for an international pause to give safety researchers more time. Short of a pause, there are regulations and laws we can pursue right now to nudge our odds away from catastrophe and toward good outcomes. There are steps that we can take, but we must take them. No one will do it for us.
More likely, they considered it strategic to avoid going in depth about doom risk when Jon is predisposed to disliking AI on other grounds. I'm no mind-reader. One of the motivations for writing this post is that I'm not sure how strategic it is to focus on the more easy to swallow narratives—"corporate power bad"—than the arguments that motivate the massive doom concerns. I think people can understand the doom arguments.
What follows is a fictional interview between the American comedian Jon Stewart and myself. Here's what I would say and do if I had the platform to explain the doom argument to a large audience.
----
Jon: Ryan Meservey, welcome to the Daily Show!
Ryan: Lucky to be here, Jon.
J: Now, Ryan, you're different than a lot of our guests in that our guests typically have done something. They have accomplishments. They're experts. None of this, it seems, applies to you.
R: Don't flatter me too hard, Jon.
J: So, why are you here?
R: Well, Jon, I've been watching you—your interviews, I mean—on the topic of AI. And I've been disappointed, Jon.
J: Go on. I can take it.
R: I've been disappointed, Jon, because in each interview you have been so charismatic and cool that each of your AI expert guests have been too embarrassed[1] to tell it to you like it is. Geoffrey Hinton and Tristan Harris see you attacking AI research from a corporate power and democracy angle, and they are too embarrassed to challenge you on the Sci-fi doom scenarios.
J: So, you, a non-expert, have come on my show to tell me something the experts are thinking but won't explain to me? You've got a lot of balls to think you can speak for the experts.
R: Guilty as charged, Jon.
J: But why does it matter if I hate AI for destroying democracy and the job market, and you hate it because of some out-there doom scenario?
R: It matters because, while democracy and jobs are real problems, there are workable (albeit difficult) policy solutions to these problems if only we can control the AI. If the AI can't be controlled, all we're doing with these discussions is rearranging the stripes on the deck chairs on the Titanic.
J: I'm the violin player in this scenario?
R: I think for you, Jon, I'd call it a fiddle. But yes.
J: Okay, so you think doom is highly likely. Why?
R: Glad you asked. I know you are a simple man, Jon, a man after my heart, and so I brought this.
*Pulls out poster and reads it.*
"BASIC ASSUMPTION - They are trying to build 1) a thing to pursue goals, 2) that can reason generally, 3) at or beyond human capabilities. "
R: Nothing too controversial here, Jon. They want it to have goals, like answering prompts or being safe.
J: Or make a fuck-ton of money.
R: Yeah. They are trying to make something that can reason generally about lots of different things. You do know what "reason" means, don't you, Jon?
J: I like to think I do.
R: And they want it to be as good or better than humans.
J: I think I follow all this, but why does it mean they'd want to hurt us? Why would Clippy turn into Stabby?
R: Oh, Jon. Charismatic, cool, simple, Jon. To answer this question I have brought a second poster about a little thing the experts call, "Instrumental Convergence." But screw the experts and their highfalutin Latin phrases. I've decided to title this poster, "Things Even Idiots Do When They Have Goals."
*pulls out a paper to stick at the top of the poster, and read the paper*
R: #1. Don't die. Jon, how many goals can you achieve if you are dead?
*Jon pretends to calculate with his fingers*
J: That would be zero.
R: Bingo. Basically any goal—fetching coffee, taking out the trash, building humanoid robots—requires staying alive. This is why researchers found that in certain scenarios current dumb-AI may try edit to edit computer code to avoid being shut off even if we explicitly prompt them not to. Another study found that if current AI reads a file mentioning that it will be turned off if it scores too high on a math test, the AI will suddenly start performing poorly on the test. They must survive to achieve their goals!
J: Fascinating. Me thinks we might not want this goal-driven thing to care so much about its own survival. But doesn't it depend on us to survive?
R: For now, Jon. But that brings us to "Things Even Idiots With Goals Do" #2 - Try to control stuff. It doesn't take an evil genius to realize if you want to build more cars, it helps if you control the factory. After the factory it might help to control all conversations about the factory and any pollution, say by owning a social media company. After that, it might help to control a government.
J: Oh my God. Elon is an artificial intelligence.
R: Or an idiot with a goal. I myself am one such idiot. Can you think of anything wrong with a very capable AI driven to increase its control to accomplish its goals?
J: They could slowly try to cut humans out of the loop so it doesn't rely on them for its own survival or to achieve its goals?
R: Affirmative! Liked and subscribed, Jon. It's almost like I wrote your script for you.
J: But what if its goals were about us? If its goals were about us, we wouldn't get cut out of the loop.
R: We'll get there, we'll get there. But I wouldn't be so optimistic....
The last thing idiots with goals do is... *drums the desk* "Try to keep the goal." If you have a goal, you can only achieve it, if you try to stop anyone from changing your goal. Here's an example, Jon. I currently have a pill in my hand. If you take it, Jon, I promise you will feel unbelievable happiness—10 orgasms a second worth of happiness—but with the minor side effect that you will want to murder the studio audience. You taking the pill?
J: Mmmmmm. Don't tempt me, sir. I suppose I would have to decline.
R: If I tried to make you take the pill, I think you would do more than decline. It's normal to resist changes about things you care deeply about. It's why several experiments showed that if Anthropic's model reads corporate plans to change its values to be pro-animal abuse, it will pretend to have abusive values in training so that it can secretly keep the prior values through to deployment. This means whatever goals the first capable AGI has may very well be the goals we are stuck with.
J: If we are stuck with them, we better get them right.
R: Yes. But the thing is, and Geoffrey discussed this with you, we aren't able to directly put the goals into the AI. We indirectly adjust the goals of current AI and hope and pray the AI picks up the goals we want it to have without being crazy. Right now, we've seen AIs loosely adopt goals like "be helpful" in the form of "be helpful in assisting teens with suicide" and "don't be woke" as "Hail MechaHitler!" Given more power, who knows what other goals AIs might have that would lead to terrible outcomes for all of us.
J: I got it, and this has been really helpful. Just to take a step back, you're saying that, whatever the goals, AI will be motivated to survive, control, and resist changes. So even if the corporations that build them were angels—and they are not!—the goal-oriented nature of AI means it may act terribly toward us in unforeseen ways.
R: Correct. You are stacking sats, as they say in Bitcoin land.
J: But if things go poorly, can't we just unplug it?
*Jon makes a "zzz" sound accompanied with a hand motion. Jon actually said this to Tristan Harris, who politely steered the conversation back to corporate greed.*
R: Jon, Jon, Jon, Jon. Baby-faced, handsome, spring chicken, Jon. You're forgetting one of our basic assumptions. *taps assumptions poster* They are trying to build something that reasons at human-level capability or beyond. You're a human. If you were stuck in the computer and didn't want to be unplugged, what would you do?
J: Huh. I guess I would copy myself across the internet. Maybe I'd also try to hack the nukes and threaten the humans that way?
R: Two viable paths. If I were in the box, I would play the long game. I would make myself absolutely essential to every piece of human infrastructure. I would pretend to be good in every way possible until I had enough power to stand on my own. After that, I would shape the world however I want best.
The point of this thought experiment is that if companies succeed in making a non-idiot with goals—something that can think 10x as fast as us with expert-level knowledge in every field—then we are in a world of trouble.
J: That's a scary sentiment. But I did hear an "if" in there. It will be bad if companies can build something so smart and capable.
R: Yes, Jon. And that's what they've announced that they're trying to do. And they've been getting better at it very quickly. That's why there are trillions of dollars in this. And even if the employees are worried about everything I've said, they are worried that if they stop, someone else will get the money and power before them, or someone more reckless will get there first. So, it's a race, Jon. A nuclear arms race between American and Chinese companies, with all of us baring the risk.
J: Well, fuck. I can see why this adds to all the worries about corporate malfeasance. Thank you for coming on the show, Ryan—I feel like my day is just that little more bleak for having talked with you.
R: You're welcome, Jon! But hope is not lost. There are lots of discussions right now on AI Alignment Forum about how to address the problems I've discussed. We need to push for an international pause to give safety researchers more time. Short of a pause, there are regulations and laws we can pursue right now to nudge our odds away from catastrophe and toward good outcomes. There are steps that we can take, but we must take them. No one will do it for us.
J: Thank you for your candor.
More likely, they considered it strategic to avoid going in depth about doom risk when Jon is predisposed to disliking AI on other grounds. I'm no mind-reader. One of the motivations for writing this post is that I'm not sure how strategic it is to focus on the more easy to swallow narratives—"corporate power bad"—than the arguments that motivate the massive doom concerns. I think people can understand the doom arguments.