Peter Twieg — LessWrong

I agree that there seems to be a lot of handwaving about the nanotech argument, but I can't say that I agree here:

>But for the sake of argument, let's say that the AGI does manage to create a nanotech factory, retain control, and still remain undetected by the humans.

>It doesn't stay undetected long enough to bootstrap and mass produce human replacement infrastructure.

It seems like the idea is that the AI would create nanomachines that it could host itself on while starting to grey goo enough of the Earth to overtake humanity. While humans would notice this at an early stage I could see it being possible that the AI would disperse itself quickly enough that it would be impossible to suppress totally, and thus humanity losing against a grey goo wave would be inevitable.

The alternative story that I've seen is that the AI engineers a dormant virus that is transmitted to most of humanity without generating alarm, and then suddenly activates to kill every human. Also seems handwavey but it does skip the "AI would need to establish its own nation" phase.

Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky

Peter Twieg3y3-14

>then it just needs to find one poor schmuck to accept deliveries and help it put together its doomsday weapon.

Yes, but do I take it for granted that an AI will be able to manipulate the human into creating a virus that will kill literally everyone on Earth, or at least a sufficient number to allow the AI to enact some secondary plans to take over the world? Without being detected? Not with anywhere near 100% probability. I just think these sorts of arguments should be subject to Drake equation-style reasonings that will dilute the likelihood of doom under most circumstances.

This isn't an argument for being complacent. But it does allow us to push back against the idea that "we only have one shot at this."

Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky

Peter Twieg3y10

I outlined my expectations, not a "plan".

>You lack imagination, its painfully easy, also cost + required IQ has been dropping steadily every year.

Conversely, it's possible that doomers are suffering from an overabundance of imagination here. To be a bit blunt, I don't take it for granted that an arbitrarily smart AI would be able to manipulate a human into developing a supervirus or nanomachines in a risk-free fashion.

The fast takeoff doom scenarios seem like they should be subject to Drake equation-style analyses to determine P(doom). Even if we develop malevolent AIs, I'd say that P(doom | AGI tries to harm humans) is significantly less than 100%... obviously if humans detect this it would not necessarily prevent future incidents but I'd expect enough of a response that I don't see how people could put P(doom) at 95% or more.

Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky

Peter Twieg3y3-5

>The likely result of humanity facing down an opposed superhuman intelligence is a total loss. Valid metaphors include “a 10-year-old trying to play chess against Stockfish 15”, “the 11th century trying to fight the 21st century,” and “Australopithecus trying to fight Homo sapiens“.

But obviously these metaphors are not very apt, since humanity kinda has a massive incumbent advantage that would need to be overcome. Rome Sweet Rome is a fun story not because 21st century soldiers and Roman legionnaires are intrinsically equals but because the technologically-superior side starts is facing down a massive incumbent power.

One thing that I've always found a bit handwavey about the hard takeoff scenarios is that they tend to assume that a superintelligent AI would actually be able to plot out a pathway from being in a box to eliminating humanity that is basically guaranteed to succeed. These stories tend to involve the assumption that the AI will be able to invent highly-potent weapons very quickly and without risk of detection, but it seems at least pretty plausible that...... this is just too difficult. I just think it's likely that we'll see several failed AI takeover attempts before a success occurs, and hopefully we'll learn something from these early problems that will slow things down.

Response to Tyler Cowen’s Existential risk, AI, and the inevitable turn in human history

Peter Twieg3y10

>As it turns out, the only thing that matters was scale.

I mean, in some sense yes. But AlphaGo wasn't trained by finding a transcript of every Go game that had ever been played, but instead was trained via self-play RL. But attempts to create general game-playing agents via similar methods haven't worked out very well, in my understanding. I don't assume that if we just threw 10x or 100x data at them that this would change...

>The architecture that can play 100 games and does extremely well at game 101 the first try gets way more points than one that doesn't. The one that has never read a book on the topic of the LSAT but still does well on the exam is exactly what we are looking for.

Yes, but the latter exists and is trained via human reinforcement learning that can't be translated to self-play. The former doesn't exist as far as I can tell. I don't see anyone proposing to improve GPT-4 by turning from HFRL to self-play RL.

Ultimately I think there's a possibility that the improvements to LLMs from further scaling may not be very large, and instead we'll need to find some sort of new architecture to create dangerous AGIs.

Response to Tyler Cowen’s Existential risk, AI, and the inevitable turn in human history

Peter Twieg3y10

>The finished system should be able to extend shoggoth tentacles into a given computer, identify what that computer is doing and make it do it better or differently.

Sure. GPT-X will probably help optimize a lot of software. But I don't think having more resource efficiency should be assumed to lead to recursive self-improvement beyond where we'd be at given a "perfect" use of current software tools. Will GPT-X be able to break out of those current set of tools, only having been trained to complete text and not to actually optimize systems? I don't take this for granted, and my view is that LLMs are unlikely to devise radically new software architectures on their own.

Response to Tyler Cowen’s Existential risk, AI, and the inevitable turn in human history

Peter Twieg3y10

Sure, this is useful. To your other posts, I don't think we're really disagreeing about what AGI is - I think we'd agree that if you took a model with GPT4-like capabilities and hooked it up to a chess API to reinforce it you would end up with a GPT4 model that's very good at playing chess, not something that has strongly-improved its general underlying world model and thus would also be able to say improve its LSAT score. And this is what I'm imaging most self-play training would accomplish... but I'm open to being wrong. To your point about having a "benchmark of many tasks", I guess maybe I could imagine hooking it up to like 100 different self-playing games which are individually easy to run but require vastly different skills to master, but I could also see this just... not working as well. Teams have been trying this for a decade or so already, right? A breakthrough is possible though for sure.

I'm just trying to underscore that there are lots of tasks which we hope that AGIs would be able to accomplish (eg. solving open math problems) but we probably cannot use RL to directly iterate a model to accomplish this task because we can't define a gradient of reward that would help define the AGI.

Response to Tyler Cowen’s Existential risk, AI, and the inevitable turn in human history

Peter Twieg3y10

I'm asking specifically about the assertion that "RL style self play" could be used to iterate to AGI. I don't see what sort of game could lead to this outcome. You can't have this sort of self-play with "solve this math problem" as far as I can tell, and even if you could I don't see why it would promote AGI as opposed to something that can solve a narrow class of math problems.

Obviously LLMs have amazing generalist capabilities. But as far as I can tell you can't iterate on the next version of these models by hooking them up to some sort of API that provides useful, immediate feedback... we're not at the cusp of removing the HF part of the RLHF loop. I think understanding this is key to whether we should expect slow takeoff vs. fast takeoff likelihood.

Response to Tyler Cowen’s Existential risk, AI, and the inevitable turn in human history

Peter Twieg3y20

RL isn't magic though. It works in the Go case because we can simulate Go games quickly and easily score the results and then pit adversarial AIs against eachother in order to iteratively learn.

I don't think this sort of process lends itself to the sort of tasks that we can only see an AGI accomplishing. You can't train it to say write a better version of Winds of Winter than GRRM could because you don't have a good algorithm to score each iteration.

So what I'm really trying to ask is what specific sort of open ended problems do we see being particularly conducive to fostering AGI, as opposed to a local maximizer that's highly specialized towards the particular problem?

Response to Tyler Cowen’s Existential risk, AI, and the inevitable turn in human history

Peter Twieg3y10

>First problem, A lot of future gains may come from RL style self play (IE:let the AI play around solving open ended problems)

How do people see this working? I understand the value of pointing to AI dominance in Chess/Go as illustrating how we should expect AI to recursively exceed humans at tasks, but I can't see how RL would be similarly applied to "open-ended problems" to promote similar explosive learning. What kind of open problems with a clear and instantly-discernable reward function would promote AGI growth, rather than a more-narrow type of growth geared towards solving the particular problem well?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments