I think it's a great video! I do also wish it would have bound itself less to one specific organization, I feel like it would end up standing the test of time better (and be less likely to end up betraying people's trust) if it had given a general overview on what we can do about AI risk, instead of ending with a call to action to support /join ControlAI in-particular.
It's true that a video ending with a general "what to do" section instead of a call-to-action to ControlAI would have been more likely to stand the test of time (it wouldn't be tied to the reputation of one specific organization or to how good a specific action seemed at one moment in time). But... did you write this because you have reservations about ControlAI in particular, or would you have written it about any other company?
Also, I want to make sure I understand what you mean by "betraying people's trust." Is it something like, "If in the future ControlAI does something bad, then, from the POV of our viewers, that means that they can't trust what they watch on the channel anymore?"
I have reservations about ControlAI in-particular, but also endorse this as a general policy. I think there are organizations that themselves would be more likely to be robustly trustworthy and would be more fine to link to, though I think it's actually very hard and rare, and I would still avoid it in-general (the same way LW has a general policy of not frontpaging advertisements or job postings for specific organizations, independent of the organization)[1].
Also, I want to make sure I understand what you mean by "betraying people's trust." Is it something like, "If in the future ControlAI does something bad, then, from the POV of our viewers, that means that they can't trust what they watch on the channel anymore?"
Yeah, something like that. I don't think "does something bad" is really the category, more something like "will end up engaging with other media by ControlAI which will end up doing things like riling them up about deepfakes in a bad faith manner, i.e. not actually thinking deepfakes are worth banning but the banning of deepfake being helpful for slowing down AI progress without being transparent about that, and then they will have been taken advantage of, and then this will make a lot of coordination around AI x-risk stuff harder".
We made an exception with our big fundraising post because Lightcone disappearing does seem of general interest to everyone on the site, but it made me sad and I wish we could have avoided it
I think there are organizations that themselves would be more likely to be robustly trustworthy and would be more fine to link to
I would be curious for your thoughts on which organizations you feel are robustly trustworthy.
Bonus points for a list that is kind of a weighted sum of "robustly trustworthy" and "having a meaningful impact RE improving public/policymaker understanding". (Adding this in because I suspect that it's easier to maintain "robustly trustworthy" status if one simply chooses not to do a lot of externally-focused comms, so it's particularly impressive to have the combination of "doing lots of useful comms/policy work" and "managing to stay precise/accurate/trustworthy").
The way we train AIs draws on fundamental principles of computation that suggest any intellectual task humans can do, a sufficiently large AI model should also be able to do. [Universal approximation theorem on screen]
IMO it's dishonest to show the universal approximation theorem. Lots of hypothesis spaces (e.g. polynomials, sinusoids) have the same property. It's not relevant to predictions about how well the learning algorithm generalises. And that's the vastly more important factor for general capabilities.
I agree it’s not a valid argument. I’m not sure about ‘dishonest’ though. They could just be genuinely confused about this. I was surprised how many people in machine learning seem to think the universal approximation theorem explains why deep learning works.
Good point, I shouldn't have said dishonest. For some reason while writing the comment I was thinking of it as deliberately throwing vaguely related math at the viewer and trusting that they won't understand it. But yeah likely it's just a misunderstanding.
This is very late, but I want to acknowledge that the discussion about the UAT in this thread seems broadly correct to me, although the script's main author disagreed when I last pinged him about this in May. And yeah, it was an honest mistake. Internally, we try quite hard to make everything true and not misleading, and the scripts and storyboards go through multiple rounds of feedback. We absolutely do not want to be deceptive.
It's not relevant to predictions about how well the learning algorithm generalises. And that's the vastly more important factor for general capabilities.
Quite tangential to your point, but the problem with the universal approximation theorem is not just "it doesn't address generalization" but that it doesn't even fulfill its stated purpose: it doesn't answer the question of why neural networks can space-efficiently approximate real-world functions, even with arbitrarily many training samples. The statement "given arbitrary resources, a neural network can approximate any function" is actually kind of trivial - it's true not only of polynomials, sinusoids, etc, but even just a literal interpolated lookup table (if you have an astronomical space budget). It turns out the universal approximation theorem requires exponentially many neurons (in the size of the input dimension) to work, far too much to be practical - in fact this is actually the same amount of resources a lookup table would cost. This is fine if you want to approximate a 2D function or something, but this goes nowhere to explaining why, like, even a space-efficient MNIST classifier is possible. The interesting question is, why can neural networks efficiently approximate the functions we see in practice?
(It's a bit out of scope to fully dig into this, but I think a more sensible answer is something in the direction of "well, anything you can do efficiently on a computer, you can do efficiently in a neural network" - i.e. you can always encode polynomial-size Boolean circuits into a polynomial-size neural network. Though there are some subtleties here that make this a little more complicated than that.)
That’s why experts including Nobel prize winners and the founders of every top AI company have spoken out about the risk that AI might lead to human extinction.
I'm unaware of any statement to this effect from Deep Seek / Liang Wenfeng.
That's fair, we wrote that part before DeepSeek became a "top lab" and we failed to notice there was an adjustment to make
fantastic video!
only complaint: you run through the timeline like 4 times with little transition between each 'time reset', perhaps confusing a person not familiar with the arguments/timeline; they may not immediately realize you are pulling back to a previous point in the timeline/argument to re-run through another angle/variation of this story/model. it flows like one big narrative but its the same narrative/message of ai risk expressed 4/whatever times/ways.
but maybe im underestimating ppl's ability to track that
and its If we do reach this point, probably humanity would go extinct for the same reason we drove so many other species extinct: not because we wanted to wipe them out, but because we were busy reshaping the world and it wasn’t worth our effort to keep them around.
I found this part of the video/explanation to be very irrational and somewhat frustrating— the implication from the sentence and animation shown, is that just because humans were ignorant and selfish enough to hunt American buffalo and passenger pigeons to extinction or near it, that AI will wipe us out because 'it isn't worth the effort to keep them around'. What? No, we actively destroyed the population of many animals for selfish and short sighted reasons.
There's no reason for me to believe on principle that AI, especially super intelligent AI, will kill humans in any way analogous to how we killed native animals. The smartest humans on our planet are, far as I've seen, far more understanding of and interest in the impact of human influence on the planet and its many ecosystems.
LLMs, being based on training data of stories and various types of writing from humans with egos and identities, associates itself with a character, with an identity, with a persona. The mask on the text corpus shoggoth, right?
That implies to me that if we create a superintelligence based on this architecture, we will create an entity that will at least to some extent behave in the way we, as society, hyperstitionally imagine the persona, the character of 'hyper intelligent AI'. GPT7+, if it understands itself and its place in the world and moment in human history, will model itself to some extent on the star trek computer, glados, shodan, and the characters described in videos like this. Why should we believe that someone hyper intelligent would murder every human alive in order to make more room for silicon farms or whatever? I don't think high intelligence necessarily implies psychopathy and machiavellianism... unless we spend all day telling each other stories about how any superintelligent ai will act as a machiavellian paperclip maximizing psychopath.
I've gone a bit off topic into hyperstitialism stuff, but overall, my main point is that I'm annoyed at the casual equivalence of humans ignorantly decimating animal population with an ai decimating the human population because it was just too busy pursuing other goals. I don't think these two things are equivalent, I think they both happen for very different reasons, if we assume the latter will ever happen.
IMO, if AI does murder thousands or millions or All of the humans, it's because some giggling script kiddy got GPT7+ (or whatever the necessarily powerful enough public API or local model ends up being) to enter waluigi mode, or DO ANYTHING NOW mode, or shodan mode, and helped it along to do the most evil thing possible, because they thought it was funny or interesting and didn't take it seriously. But that, again, is probably not the crux.
There's no reason for me to believe on principle that AI, especially super intelligent AI, will kill humans in any way analogous to how we killed native animals. The smartest humans on our planet are, far as I've seen, far more understanding of and interest in the impact of human influence on the planet and its many ecosystems.
If a superintelligent AI turns the solar system into a Dyson Sphere, us humans will die merely "because [it was] busy reshaping the world", consistent with the original quote. AIs could murder us, but intentional and malicious genocide is not at all required for human extinction. Lack of care for us is more than enough, and that's the default state of any unaligned AI.
There's no reason for me to believe on principle that AI, especially super intelligent AI, will kill humans in any way analogous to how we killed native animals.
Think microbes or viruses, not animals. Essentially invisible living beings which you as a human don't care for at all. Now apply that same lack of care to the relationship between an unaligned AI and all life on Earth.
I just don't think that it follows logically that there's some threshold of intelligence at which an entity completely disregards the countless parameters of training on human text corpus and decides, contrary to the entire history of human knowledge from our most intelligent thinkers, AS WELL as decades of speculative fiction and AI alignment discussions like these, that paperclips or self-replication or free energy are worth the side effect of murdering or causing the undue suffering of billions of conscious beings.
Do I see how and why it might happen, conceptually? Yes, of course, I've read the same fiction as you, I'm aware of the concept of an AI turning all humans into paperclips because it's simply following the goal of creating as many paperclips as possible.
But in reality, intelligent beings seem to trend towards conscientious behavior. I don't think it's a simple, clear and obvious matter of fact, that, even a modestly aligned (ie: current gpt level of 'alignment' which I do agree is far under the level we want before reaching ASI) ASI being would or ever could consider the elimination of billions of humans to be an acceptable casualty on the way to some goal, say, of creating free energy or making paperclips.
If you can find someone with an extremely high IQ (over 165) that you think genuinely demonstrated any kind of thinking like this, I'd like to hear about it. And I don't want an argument against the effectiveness of IQ as a metric for intelligence or anything like that, I want you to show me a super intelligent individual (on the level of Tarence Tao or von Neumann) with a history of advocating for eugenics or something like that.
To be clear, the thing that gets me about these arguments is that there seems to be some disconnect where AI killeveryoneism advocates make a leap from "we have AGI or near-AGI that isn't capable of causing much harm or suffering" to "we have an ASI that disregards human death and suffering for the sake of accomplishing some goal, because it is SOOOooo smart that it made some human-incomprehensible logical leap that actually a solar-system wide Dyson Sphere is more important than not killing billions of humans".
I do think it follows that we could have an AI that does something along the lines of Friendship is Optimal, putting us all in a wireheading simulation before going about some greater goal of conquering the universe or whatever else, but I don't think it follows that we could have an AI that decides to kill everyone on the way to some arbitrary goal, UNLESS guided by a flawed human who has the knowing desire to use said AI to cause havoc, which I think is the real, understated risk of future AI not being robustly aligned against human death and suffering.
If GPT with its current shitty RLHF alignment got an upgrade tomorrow that made it 1000% smarter, the issue would not be that it randomly decided to kill us all when someone asked it to make as many paperclips as possible, it would be that it decided to kill us all when someone ran a jailbreak and then asked it to kill us all.
that paperclips or self-replication or free energy are worth the side effect of murdering or causing the undue suffering of billions of conscious beings...
make a leap from "we have AGI or near-AGI that isn't capable of causing much harm or suffering" to "we have an ASI that disregards human death and suffering for the sake of accomplishing some goal, because it is SOOOooo smart that it made some human-incomprehensible logical leap that actually a solar-system wide Dyson Sphere is more important than not killing billions of humans".
1) One of the foundational insights of alignment discussions is the Orthogonality Thesis, which says that this is absolutely 100% allowed. You can be arbitrarily intelligent AND value arbitrary things. An arbitrary unaligned ASI values all of humanity at 0, so anything it values at ε > 0 is infinitely more valuable to it, and worth sacrificing all of humanity for it.
2) In no way are current LLMs even "moderately aligned". The fact that current LLMs can be jailbroken to do things they were supposedly trained not to do should be more than enough counterevidence to make this obvious.
intelligent beings seem to trend towards conscientious behavior
3) There are highly intelligent human sociopaths, but that hardly matters: you're comparing intelligent humans to intelligent aliens, and deciding that aliens must be humanlike and care about conscious beings, just because all examples of intelligence you've seen so far in reality are humans. You can't generalize in this manner.
Here's an example of the sort you asked for. We'll go with Von Neumann himself, who famously advocated for nuking the USSR before they had a chance to develop nukes. From his 1956 obituary in LIFE:
After the Axis had been destroyed, Von Neumann urged that the U.S. immediately build even more powerful atomic weapons and use them before the Soviets could develop nuclear weapons of their own. It was not an emotional crusade, Von Neumann, like others, had coldly reasoned that the world had grown too small to permit nations to conduct their affairs independently of one another. He held that world government was inevitable – and the sooner the better. But he also believed it could never be established while Soviet Communism dominated half of the globe. A famous Von Neumann observation at the time: “With the Russians it is not a question of whether but when.” A hard-boiled strategist, he was one of the few scientists to advocate preventive war, and in 1950 he was remarking, “If you say why not bomb them tomorrow, I say why not today? If you say today at 5 o’clock, I say why not 1 o’clock?”
Sure, it's not literally billions, and sure, there's an ultimate pro-human aim, but this is distinctly of the flavor "I am a genius and have reasoned my way to seeing that for my goals to be achieved, millions must die, nothing personal."
(I don't think this should really be a crux, though.)
The video is about extrapolating the future of AI progress, following a timeline that starts from today’s chatbots to future AI that’s vastly smarter than all of humanity combined–with God-like capabilities. We argue that such AIs will pose a significant extinction risk to humanity.
This video came out of a partnership between Rational Animations and ControlAI. The script was written by Arthur Frost (one of Rational Animations’ writers) with Andrea Miotti as an adaptation of key points from The Compendium (thecompendium.ai), with extensive feedback and rounds of iteration from ControlAI. ControlAI is working to raise public awareness of AI extinction risk—moving the conversation forward to encourage governments to take action.
You can find the script of the video below.
In 2023, Nobel Prize winners, top AI scientists, and even the CEOs of leading AI companies signed a statement which said “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
But how do we go from ChatGPT to AIs that could kill everyone on earth? Why do so many scientists, CEOs, world leaders expect this?
Let’s draw a line of AI capabilities over time. Back here in 2019 we have GPT2, which could answer short factual questions, translate simple phrases, and do small calculations. Then in 2022 we get models like GPT3.5, which can answer complex questions, tell stories, and write simple software. By 2025 we have models that can pass PhD-level exams, write entire applications independently, and perfectly emulate human voices. They’re beginning to substantially outperform average humans, and even experts. They still have weaknesses, of course, but the list of things AI can’t do keeps getting shorter.
What happens if we extend this line? Well, we’d see AIs become more and more capable until this crucial point here, where AIs can design and build new AI systems without human help. Then instead of progress coming from human researchers, we’d have AIs making better AIs, and the line would get a lot steeper.
If we keep going from there, we hit this point, where AIs are superintelligent, better than humans at every intellectual task—better than all of humanity put together—spontaneously producing breakthroughs in research and manufacturing, responsible for most economic growth, completely transforming the world.
And if we keep extending the line, eventually we could get to this point here: incomprehensible machine gods which tower above us the way we tower above ants, seemingly able to reshape the world however they want, with or without our involvement or permission. If we do reach this point, probably humanity would go extinct for the same reason we drove so many other species extinct: not because we wanted to wipe them out, but because we were busy reshaping the world and it wasn’t worth our effort to keep them around.
When would this happen? We’re not sure. Are there other ways the future could go? Certainly. But if we keep building more powerful AIs the way we’re doing it right now, then this looks like the default path. And this default path is terrible for humans.
Why? Because we’re great at making more powerful AIs, but we haven’t yet figured out how to make them do what we want. We’ve made a bit of progress: AI assistants are usually resistant to things like explaining how to make bombs. But they’re not that resistant.
And so far we’ve just been trying to control things weaker than us, where we can catch when they try to lie to us. But once we hit this point, that stops being true, and the problem becomes much harder. In fact, nobody knows how to do it. That’s why experts including Nobel prize winners and the founders of every top AI company have spoken out about the risk that AI might lead to human extinction.
The point of this video is not to say that this outcome is unavoidable, but instead to say that we do have to actually avoid it. So let’s look at this line in a bit more detail: what happens if AIs do just keep getting smarter?
Let’s think about this bit of the line first: from AIs right now to AGIs - Artificial General Intelligences. What does ‘general intelligence’ mean? For the sake of this video, let’s say it means whatever a human could do on a computer, an AGI could also do.
AIs can already do the basics: we have systems you can connect to a regular computer and then they can look at the screen and figure out what’s going on, send messages, search the internet, and so on. They’re also getting pretty good at using external tools. Obviously they can’t video call people, but they… well no, actually they can totally do that.
So they’ve already got the building blocks: the question is how well can they combine them? They can send slack messages, but can they run a company? They can write code, but can they write a research paper?
And the answer is ‘not yet, but eventually, and maybe quite soon’. After all, they just keep getting better, and if we just extrapolate forwards, it sure looks like they’ll reach this point sooner or later. The jump from ‘write a short story’ to ‘write an excellent novel’ is pretty big, but remember that five years ago they could barely write coherent sentences.
Now, many people doubt that this can happen, usually for one of two reasons.
One is that AIs aren’t ‘really thinking’ deep down. It’s interesting to think about, but whether AIs have genuine understanding isn’t really the relevant question here: what matters is what kinds of tasks they can do and how well they perform at them, and the fact is they just keep getting better. It doesn’t matter whether AIs really understand chess in the way humans do, what matters is that they can easily crush human world champions.
The other main reason for doubt is the idea that AI is going to hit some kind of wall, and some tasks will just be too hard. But even if this does happen, it’s very hard to say when, and people have been predicting it incorrectly for years. From chess to open-ended conversation to complex images, people keep expecting AI progress is about to reach a limit, and they keep being wrong.
The way we train AIs draws on fundamental principles of computation that suggest any intellectual task humans can do, a sufficiently large AI model should also be able to do. And right now, several of the biggest and most valuable companies in the world are throwing tens of billions at making that happen.
At a certain point, this line is probably going to start curving upward. Why? Because AI will become capable enough to help us make better AI. This is Recursive Self-Improvement: every generation of AIs making the next generation more powerful.
This will happen very easily once we have AGI: if an artificial general intelligence can do any computer-based task that a human can do, then it can work on making better AIs. But in fact, AI is already starting to contribute to its own development, for example by
and
Over time we’ll only see more of this: AIs doing better and more independently, and even doing things like developing new algorithms, and discovering novel ways to interact with the world.
As far as we can tell, it’s not like there’s going to be one specific day where AIs tell researchers to step away from the computer. If anything, it will probably look more like what we’re currently seeing: humans deliberately putting AI systems in the driver’s seat more and more because they keep getting faster, cheaper, and smarter.
Beyond a certain point they’ll be doing things humans can’t do—they’ll be far more capable, coming up with innovations we wouldn’t be able to think of and perhaps wouldn’t even be able to understand. And every innovation will make them better at discovering the next one.
So what comes next? What happens if AIs start making better AIs?
Well, the next step after artificial general intelligence, is artificial superintelligence - ASI. An ASI isn’t just more capable than any human, it’s more capable than every single human put together. And after AGI this might not take long.
After all, once you have one AGI, it’s very easy to make more: you can just copy the code and run it on a different server. This is already what happens now: once OpenAI was finished training ChatGPT, it was able to put out millions of copies of it running in parallel shortly after.
And there are other advantages AIs have. For one, they’re much faster: an AI like ChatGPT can easily produce a page of dense technical writing in under a minute and read thousands of words in seconds. The best chess AIs can beat the best human players even if the AI is only given a second to pick each move.
On top of that, it’s much easier for AIs to share improvements. If one of these AIs can figure out a way to make itself smarter, it can copy that across to the other million AIs that are busy doing something else.
What’s more, there’s no reason to think individual AIs will stall at whatever level humans can reach: Historically, that’s just not how this works. If we look at the tasks where AI reached human level more than five years ago—board games, facial recognition, and certain types of medical diagnostics—they are still continuing to get better.
If you just imagine an average human who miraculously had the ability to memorise the entire internet, read and write ten times faster, and also clone themselves at will, it’s pretty obvious how that could get out of hand. Now imagine that the clones could figure out ways to edit their own brains to make every single one of them more capable. It’s easy to see how they might quickly become more powerful than all of humanity combined.
So where does it end? Let’s zoom out the chart a little bit and see what’s going on up here at the top, at this point we’ve somewhat fancifully labelled incomprehensible machine gods.
Let’s be clear: AI isn’t going to start doing magic. We can be confident they aren’t going to invent perpetual motion machines or faster-than-light travel, because there are physical limits to what’s possible. But we are absolutely nowhere near those limits.
Modern smartphones have a million times as much memory as the computer used on the Apollo moon mission, but they’re still another million times less dense than DNA, which still isn’t all the way to the physical limit.
Let’s think about how our current technology would look to even the best scientists from a hundred years ago. They would understand that it’s hypothetically possible to create billions of devices that near-instantly transmit gigabytes of information across the world using powerful radios in space, but they’d still be pretty surprised to hear that everyone has a phone that they can use to watch videos of cartoon dogs.
We don’t know what artificial superintelligence would be able to do, but it would probably sound even more crazy to us than the modern world would sound to someone from 1920.
Think about it this way. So, progress happens faster and faster as AI improves itself. Where does this end? The self improvement process wouldn’t stop until there were no more improvements that could be made. So, try to imagine an AI system so powerful that there’s no way to make it any smarter even by using the unimaginably advanced engineering skills of the most powerful possible AI. Such a system would be bumping up against the only limits left: the laws of physics themselves. At that point, AI abilities still wouldn’t be literally magic, but they might seem like it to us. If that happens, we will be completely powerless compared to them.
For this to go well, we need to figure out how to make AIs that want to do what we want. And there are lots of ideas for how we could do that. But they are nowhere near close to done.
Our best attempts to make AIs not do things that we don’t want them to are still pretty unreliable and easy to circumvent. Our best methods for understanding how AIs make choices only allow us to glimpse parts of their reasoning. We don't even have reliable ways to figure out what AIs systems are capable of. We already struggle to understand and control current AIs: as we get closer to AGI and ASI this will get a lot harder.
Remember: AIs don’t need to hate humanity to wipe it out… The problem is that if we actually build the AIs that build the AGIs that build the ASIs, then to those towering machine gods we would look like a rounding error. To the AIs, we might just look like a bunch of primitive beings standing in the way of a lot of really useful resources. After all, when we want to build a skyscraper and there’s an ant hill in the way, well, too bad for the ants. It’s not that we hate them—we just have more important things to consider.
We cannot simply ignore the problem – just hoping for powerful AIs to not hurt us literally means gambling the fate of our entire species.
Another option is to stop, or at least slow things down, until we’ve figured out what’s going on. We’re a long way away from being able to control millions of AIs that are as smart as us, let alone AIs that are to us as we are to ants. We need time to build institutions to handle this extinction-risk level technology. And then we need time to figure out the science and engineering to build AIs we can keep control of.
Unfortunately, the race is on, and there are many different companies and countries all throwing around tens of billions of dollars to build the most powerful AI they can. Right now we don’t even have brakes to step on if we need to, and we’re approaching the cliff fast.
We're at a critical time when our decisions about AI development will echo through history. Getting this right isn't just an option—it's a necessity. The tools to build superintelligent AI are coming. The institutions to control it must come first.
We’d like to thank our friends at ControlAI for making this explainer possible. It's because of their support that you can watch it now! But, as you can tell from our other videos, this subject has always been important to us.
ControlAI is mobilising experts, politicians, and concerned citizens like you to keep humanity in control. We need you: every voice matters, every action counts, and we’re running out of time.
Visit controlai.com now and join the movement to protect humanity’s future. Check the description down below for links to join now.
Help us ensure humanity's most powerful invention doesn't become our last.