RA x ControlAI video: What if AI just keeps getting smarter?

[-]habryka5mo337

I think it's a great video! I do also wish it would have bound itself less to one specific organization, I feel like it would end up standing the test of time better (and be less likely to end up betraying people's trust) if it had given a general overview on what we can do about AI risk, instead of ending with a call to action to support /join ControlAI in-particular.

[-]Writer5mo82

It's true that a video ending with a general "what to do" section instead of a call-to-action to ControlAI would have been more likely to stand the test of time (it wouldn't be tied to the reputation of one specific organization or to how good a specific action seemed at one moment in time). But... did you write this because you have reservations about ControlAI in particular, or would you have written it about any other company?

Also, I want to make sure I understand what you mean by "betraying people's trust." Is it something like, "If in the future ControlAI does something bad, then, from the POV of our viewers, that means that they can't trust what they watch on the channel anymore?"

[-]habryka5mo71

I have reservations about ControlAI in-particular, but also endorse this as a general policy. I think there are organizations that themselves would be more likely to be robustly trustworthy and would be more fine to link to, though I think it's actually very hard and rare, and I would still avoid it in-general (the same way LW has a general policy of not frontpaging advertisements or job postings for specific organizations, independent of the organization)^[1].

Also, I want to make sure I understand what you mean by "betraying people's trust." Is it something like, "If in the future ControlAI does something bad, then, from the POV of our viewers, that means that they can't trust what they watch on the channel anymore?"

Yeah, something like that. I don't think "does something bad" is really the category, more something like "will end up engaging with other media by ControlAI which will end up doing things like riling them up about deepfakes in a bad faith manner, i.e. not actually thinking deepfakes are worth banning but the banning of deepfake being helpful for slowing down AI progress without being transparent about that, and then they will have been taken advantage of, and then this will make a lot of coordination around AI x-risk stuff harder".

^{^}
We made an exception with our big fundraising post because Lightcone disappearing does seem of general interest to everyone on the site, but it made me sad and I wish we could have avoided it

[-]Orpheus165mo70

I think there are organizations that themselves would be more likely to be robustly trustworthy and would be more fine to link to

I would be curious for your thoughts on which organizations you feel are robustly trustworthy.

Bonus points for a list that is kind of a weighted sum of "robustly trustworthy" and "having a meaningful impact RE improving public/policymaker understanding". (Adding this in because I suspect that it's easier to maintain "robustly trustworthy" status if one simply chooses not to do a lot of externally-focused comms, so it's particularly impressive to have the combination of "doing lots of useful comms/policy work" and "managing to stay precise/accurate/trustworthy").

[-]Jeremy Gillen5mo1010

The way we train AIs draws on fundamental principles of computation that suggest any intellectual task humans can do, a sufficiently large AI model should also be able to do. [Universal approximation theorem on screen]

IMO it's dishonest to show the universal approximation theorem. Lots of hypothesis spaces (e.g. polynomials, sinusoids) have the same property. It's not relevant to predictions about how well the learning algorithm generalises. And that's the vastly more important factor for general capabilities.

[-]Lucius Bushnaq5mo1611

I agree it’s not a valid argument. I’m not sure about ‘dishonest’ though. They could just be genuinely confused about this. I was surprised how many people in machine learning seem to think the universal approximation theorem explains why deep learning works.

[-]Jeremy Gillen5mo42

Good point, I shouldn't have said dishonest. For some reason while writing the comment I was thinking of it as deliberately throwing vaguely related math at the viewer and trusting that they won't understand it. But yeah likely it's just a misunderstanding.

[-]Writer25d40

This is very late, but I want to acknowledge that the discussion about the UAT in this thread seems broadly correct to me, although the script's main author disagreed when I last pinged him about this in May. And yeah, it was an honest mistake. Internally, we try quite hard to make everything true and not misleading, and the scripts and storyboards go through multiple rounds of feedback. We absolutely do not want to be deceptive.

[-]Zach Furman5mo30

It's not relevant to predictions about how well the learning algorithm generalises. And that's the vastly more important factor for general capabilities.

Quite tangential to your point, but the problem with the universal approximation theorem is not just "it doesn't address generalization" but that it doesn't even fulfill its stated purpose: it doesn't answer the question of why neural networks can space-efficiently approximate real-world functions, even with arbitrarily many training samples. The statement "given arbitrary resources, a neural network can approximate any function" is actually kind of trivial - it's true not only of polynomials, sinusoids, etc, but even just a literal interpolated lookup table (if you have an astronomical space budget). It turns out the universal approximation theorem requires exponentially many neurons (in the size of the input dimension) to work, far too much to be practical - in fact this is actually the same amount of resources a lookup table would cost. This is fine if you want to approximate a 2D function or something, but this goes nowhere to explaining why, like, even a space-efficient MNIST classifier is possible. The interesting question is, why can neural networks efficiently approximate the functions we see in practice?

(It's a bit out of scope to fully dig into this, but I think a more sensible answer is something in the direction of "well, anything you can do efficiently on a computer, you can do efficiently in a neural network" - i.e. you can always encode polynomial-size Boolean circuits into a polynomial-size neural network. Though there are some subtleties here that make this a little more complicated than that.)

[-]1a3orn5mo8-2

That’s why experts including Nobel prize winners and the founders of every top AI company have spoken out about the risk that AI might lead to human extinction.

I'm unaware of any statement to this effect from Deep Seek / Liang Wenfeng.

[-]Writer5mo70

That's fair, we wrote that part before DeepSeek became a "top lab" and we failed to notice there was an adjustment to make

[-]Mikhail Samin5mo*42

Great video! Amazing job, thank you!

[-][anonymous]5mo20

fantastic video!

only complaint: you run through the timeline like 4 times with little transition between each 'time reset', perhaps confusing a person not familiar with the arguments/timeline; they may not immediately realize you are pulling back to a previous point in the timeline/argument to re-run through another angle/variation of this story/model. it flows like one big narrative but its the same narrative/message of ai risk expressed 4/whatever times/ways.

but maybe im underestimating ppl's ability to track that

[-]lillybaeum5mo0-5

and its If we do reach this point, probably humanity would go extinct for the same reason we drove so many other species extinct: not because we wanted to wipe them out, but because we were busy reshaping the world and it wasn’t worth our effort to keep them around.

I found this part of the video/explanation to be very irrational and somewhat frustrating— the implication from the sentence and animation shown, is that just because humans were ignorant and selfish enough to hunt American buffalo and passenger pigeons to extinction or near it, that AI will wipe us out because 'it isn't worth the effort to keep them around'. What? No, we actively destroyed the population of many animals for selfish and short sighted reasons.

There's no reason for me to believe on principle that AI, especially super intelligent AI, will kill humans in any way analogous to how we killed native animals. The smartest humans on our planet are, far as I've seen, far more understanding of and interest in the impact of human influence on the planet and its many ecosystems.

LLMs, being based on training data of stories and various types of writing from humans with egos and identities, associates itself with a character, with an identity, with a persona. The mask on the text corpus shoggoth, right?

That implies to me that if we create a superintelligence based on this architecture, we will create an entity that will at least to some extent behave in the way we, as society, hyperstitionally imagine the persona, the character of 'hyper intelligent AI'. GPT7+, if it understands itself and its place in the world and moment in human history, will model itself to some extent on the star trek computer, glados, shodan, and the characters described in videos like this. Why should we believe that someone hyper intelligent would murder every human alive in order to make more room for silicon farms or whatever? I don't think high intelligence necessarily implies psychopathy and machiavellianism... unless we spend all day telling each other stories about how any superintelligent ai will act as a machiavellian paperclip maximizing psychopath.

I've gone a bit off topic into hyperstitialism stuff, but overall, my main point is that I'm annoyed at the casual equivalence of humans ignorantly decimating animal population with an ai decimating the human population because it was just too busy pursuing other goals. I don't think these two things are equivalent, I think they both happen for very different reasons, if we assume the latter will ever happen.

IMO, if AI does murder thousands or millions or All of the humans, it's because some giggling script kiddy got GPT7+ (or whatever the necessarily powerful enough public API or local model ends up being) to enter waluigi mode, or DO ANYTHING NOW mode, or shodan mode, and helped it along to do the most evil thing possible, because they thought it was funny or interesting and didn't take it seriously. But that, again, is probably not the crux.

[-]MondSemmel5mo30

There's no reason for me to believe on principle that AI, especially super intelligent AI, will kill humans in any way analogous to how we killed native animals. The smartest humans on our planet are, far as I've seen, far more understanding of and interest in the impact of human influence on the planet and its many ecosystems.

If a superintelligent AI turns the solar system into a Dyson Sphere, us humans will die merely "because [it was] busy reshaping the world", consistent with the original quote. AIs could murder us, but intentional and malicious genocide is not at all required for human extinction. Lack of care for us is more than enough, and that's the default state of any unaligned AI.

There's no reason for me to believe on principle that AI, especially super intelligent AI, will kill humans in any way analogous to how we killed native animals.

Think microbes or viruses, not animals. Essentially invisible living beings which you as a human don't care for at all. Now apply that same lack of care to the relationship between an unaligned AI and all life on Earth.

[-]lillybaeum5mo0-3

I just don't think that it follows logically that there's some threshold of intelligence at which an entity completely disregards the countless parameters of training on human text corpus and decides, contrary to the entire history of human knowledge from our most intelligent thinkers, AS WELL as decades of speculative fiction and AI alignment discussions like these, that paperclips or self-replication or free energy are worth the side effect of murdering or causing the undue suffering of billions of conscious beings.

Do I see how and why it might happen, conceptually? Yes, of course, I've read the same fiction as you, I'm aware of the concept of an AI turning all humans into paperclips because it's simply following the goal of creating as many paperclips as possible.

But in reality, intelligent beings seem to trend towards conscientious behavior. I don't think it's a simple, clear and obvious matter of fact, that, even a modestly aligned (ie: current gpt level of 'alignment' which I do agree is far under the level we want before reaching ASI) ASI being would or ever could consider the elimination of billions of humans to be an acceptable casualty on the way to some goal, say, of creating free energy or making paperclips.

If you can find someone with an extremely high IQ (over 165) that you think genuinely demonstrated any kind of thinking like this, I'd like to hear about it. And I don't want an argument against the effectiveness of IQ as a metric for intelligence or anything like that, I want you to show me a super intelligent individual (on the level of Tarence Tao or von Neumann) with a history of advocating for eugenics or something like that.

To be clear, the thing that gets me about these arguments is that there seems to be some disconnect where AI killeveryoneism advocates make a leap from "we have AGI or near-AGI that isn't capable of causing much harm or suffering" to "we have an ASI that disregards human death and suffering for the sake of accomplishing some goal, because it is SOOOooo smart that it made some human-incomprehensible logical leap that actually a solar-system wide Dyson Sphere is more important than not killing billions of humans".

I do think it follows that we could have an AI that does something along the lines of Friendship is Optimal, putting us all in a wireheading simulation before going about some greater goal of conquering the universe or whatever else, but I don't think it follows that we could have an AI that decides to kill everyone on the way to some arbitrary goal, UNLESS guided by a flawed human who has the knowing desire to use said AI to cause havoc, which I think is the real, understated risk of future AI not being robustly aligned against human death and suffering.

If GPT with its current shitty RLHF alignment got an upgrade tomorrow that made it 1000% smarter, the issue would not be that it randomly decided to kill us all when someone asked it to make as many paperclips as possible, it would be that it decided to kill us all when someone ran a jailbreak and then asked it to kill us all.

[-]MondSemmel5mo4-2

that paperclips or self-replication or free energy are worth the side effect of murdering or causing the undue suffering of billions of conscious beings...

make a leap from "we have AGI or near-AGI that isn't capable of causing much harm or suffering" to "we have an ASI that disregards human death and suffering for the sake of accomplishing some goal, because it is SOOOooo smart that it made some human-incomprehensible logical leap that actually a solar-system wide Dyson Sphere is more important than not killing billions of humans".

1) One of the foundational insights of alignment discussions is the Orthogonality Thesis, which says that this is absolutely 100% allowed. You can be arbitrarily intelligent AND value arbitrary things. An arbitrary unaligned ASI values all of humanity at 0, so anything it values at ε > 0 is infinitely more valuable to it, and worth sacrificing all of humanity for it.

2) In no way are current LLMs even "moderately aligned". The fact that current LLMs can be jailbroken to do things they were supposedly trained not to do should be more than enough counterevidence to make this obvious.

intelligent beings seem to trend towards conscientious behavior

3) There are highly intelligent human sociopaths, but that hardly matters: you're comparing intelligent humans to intelligent aliens, and deciding that aliens must be humanlike and care about conscious beings, just because all examples of intelligence you've seen so far in reality are humans. You can't generalize in this manner.

[-]MalcolmMcLeod5mo31

Here's an example of the sort you asked for. We'll go with Von Neumann himself, who famously advocated for nuking the USSR before they had a chance to develop nukes. From his 1956 obituary in LIFE:

After the Axis had been destroyed, Von Neumann urged that the U.S. immediately build even more powerful atomic weapons and use them before the Soviets could develop nuclear weapons of their own. It was not an emotional crusade, Von Neumann, like others, had coldly reasoned that the world had grown too small to permit nations to conduct their affairs independently of one another. He held that world government was inevitable – and the sooner the better. But he also believed it could never be established while Soviet Communism dominated half of the globe. A famous Von Neumann observation at the time: “With the Russians it is not a question of whether but when.” A hard-boiled strategist, he was one of the few scientists to advocate preventive war, and in 1950 he was remarking, “If you say why not bomb them tomorrow, I say why not today? If you say today at 5 o’clock, I say why not 1 o’clock?”

Sure, it's not literally billions, and sure, there's an ultimate pro-human aim, but this is distinctly of the flavor "I am a genius and have reasoned my way to seeing that for my goals to be achieved, millions must die, nothing personal."

(I don't think this should really be a crux, though.)

LESSWRONG
LW

LESSWRONG
LW

100

RA x ControlAI video: What if AI just keeps getting smarter?

100

100

Artificial Intelligence leads to Artificial General Intelligence

Artificial General Intelligence leads to Recursive Self-Improvement

Recursive Self-Improvement leads to Artificial Superintelligence

ASI leads to godlike AI

The Default Path