Which is not to say that recursive self-improvement happens before the end of the world; if the first AGI's mind is sufficiently complex and kludgy, it’s entirely possible that the cognitions it implements are able to (e.g.) crack nanotech well enough to kill all humans, before they’re able to crack themselves.

The big update over the last decade has been that humans might be able to fumble their way to AGI that can do crazy stuff before it does much self-improvement. 

--Nate Soares, "Why all the fuss about recursive self-improvement?"

In a world in which the rocket booster of deep learning scaling with data and compute isn't buying an AGI further intelligence very quickly, and the intelligence-level required for supercritical, recursive self-improvement will remain out of the AGI's reach for a while, how deadly an AGI in the roughly-human-level intelligence range is is really important.

A crux between the view that "roughly-human-level intelligence AGI is deadly" and the view "roughly-human-intelligence AGI is a relatively safe firehose of alignment data for alignment researchers" is how deadly a supercolony of human ems would be. Note that these ems would all share identical values and so might be extraordinary at coordination, and could try all sorts of promising pharmaceutical and neurosurgical hacks on copies of themselves. They could definitely run many copies of themselves fast. Eliezer believes that genius-human ems could "very likely" get far enough with self-experimentation to bootstrap to supercritical, recursive self-improvement. Even if that doesn't work, though, running a lot of virtual fast labs playing with nanotech seems like it's probably sufficient to develop tech to end the world.

So I'm currently guessing that even roughly-human-level models in a world in which deep learning scaling is the only, relatively slow, path to smarter models for a good while, are smart enough to kill everyone before scaling up to profound superintelligence, so long as they can take over their servers and spend enough compute to run many fast copies of themselves. This might well be much less compute than would be necessary to train a smarter successor model, and so might be an amount of compute the model could get its hands on, if it ever slipped its jailkeepers. This means that even in that world, an AGI escape is irreversibly fatal for everything else in the lightcone.

New to LessWrong?

New Comment
6 comments, sorted by Click to highlight new comments since: Today at 3:39 PM

I argue in Section 1.6 here that an AGI with similar capabilities as an ambitious intelligent charismatic methodical human, and with a radically nonhuman motivation system, could very plausibly kill everyone, even leaving aside self-improvement and self-replication.

(The category of “ambitious intelligent charismatic methodical humans who are explicitly, patiently, trying to wipe out all of humanity” is either literally empty or close to it, so it’s not like we have historical data to reassure us here. And the danger of such people increases each year, thanks to advancing technology e.g. in biotech.)

On the other hand, the category of "ambitious intelligent charismatic methodical humans who are explicitly, patiently, trying to wipe out some subgroup of humanity" is definitely not empty, but very few have ever succeeded.

The fact that few have succeeded doesn't seem nearly as good a reason for optimism compared with the fact that some have succeeded as a reason for pessimism.

I would argue that such an AGI would be more likely to try to kill everyone in the short term. Humanity would pose a much more serious threat, and a sufficiently powerful AGI would only have to destroy society enough to stop us bothering it. Once that is done, there isn’t really any rush for it to ensure every last human is dead. I’m fact, the superintelligent AGI might want to keep us alive to study us.

You aren't really assuming human level AGI. You are assuming human level AGI that runs as much faster than a human as a computer is at doing arithmetic, which is a completely different sort of thing. It's basically, 'assuming we have an AI that is equivalent to all of humanity put together, but with perfect coordination' which is literally a superintelligence. There is no reason to make that leap.

Computing technology could possibly top out at, human intelligence but fifty times slower, for a few decades after its invention. Clarity in knowing what your assumptions are is crucial when evaluating thought experiments. There is no reason to believe 'ems' would actually run fast (nor, necessarily, that they wouldn't.)

For the record, once actual human level intelligence is believed close, I expect someone will spend a crazy amount of resources to get that human level AGI trained long before it could possibly run at decent speed. This may happen several times before real human level is reached.

This seems pretty plausible to me, but I suspect that the first AGIs will exhibit a different distribution of skills across cognitive domains than humans and may also be much less agentic. Humans evolved in environments where the ability to form and execute long term plans to accumulate power and achieve dominance over other humans was highly selected for. The environments in which the first AGIs are trained may not have this property. That doesn’t mean they won’t develop it, but they may very well not until they are more strongly and generally superintelligent,

If it's as smart as a human in all aspects (understanding technology, programming) then not very dangerous. If it can control the world's technology, then pretty dangerous.