Un-unpluggability - can't we just unplug it?

[-]Matthew_Opitz3y75

If I had to make predictions about how humanity will most likely stumble into AGI takeover, it would be a story where humanity first promotes foundationality (dependence), both economic and emotional, on discrete narrow-AI systems. At some point, it will become unthinkable to pull the plug on these systems even if everyone were to rhetorically agree that there was a 1% chance of these systems being leveraged towards the extinction of humanity.

Then, an AGI will emerge amidst one of these narrow-AI systems (such as LLMs), inherit this infrastructure, find a way to tie all of these discrete multi-modal systems together (if humans don't already do it for the AGI), and possibly wait as long as it needs to until humanity puts itself into an acutely vulnerable position (think global nuclear war and/or civil war within multiple G7 countries like the US and/or pandemic), and only then harness these systems to take over. In such a scenario, I think a lot of people will be perfectly willing to follow orders like, "Build this suspicious factory that makes autonomous solar-powered assembler robots because our experts [who are being influenced by the AGI, unbeknownst to them] assure us that this is one of the many things necessary to do in order to defeat Russia."

I think this scenario is far more likely than the one I used to imagine, which is where AGI emerges first and then purposefully contrives to make humanity dependent on foundational AI infrastructure.

Even less likely is the pop-culture scenario where the AGI immediately tries to build terminator robots and effectively declares war on humanity without first getting humanity hooked on foundational AI infrastructure at all.

[-]M. Y. Zuo3y53

This matches my expectation of how easily humans are swayed when competing against an out-group.

i.e. "Because China/Russia/some-other-power-centre is doing this, we must accept the suggestions of X!"

Especially if local AGI are seen as part of the in-group.

[-]Oliver Sourbut3y42

I agree this is plausible - though in the foundationality/dependency bucket I also wouldn't rule out any of

misaligned AGI just straight appropriates hardware and executes a coup, bypassing existing software/AI infra
latent deceptive AGI itself gets 'foundational' in the sense above, large amounts of value dependent on its distribution, perhaps mainly by unwitting human aid
emotional dependence and welfare concern for non-dangerous AI transfers and hamstrings humanity's chance of cooperating to constrain later, dangerous deployments

[-]avturchin3y54

When AI will become a global Singleton, its un-unplugability will be its important feature: if its halts, everything will fall apart.

[-]TAG3y0-3

When, which is to say, if.

[-]Simon Lermen3y40

Maybe some people will prefer to see practical evidence instead of arguments: You can use GPT-4 and design a simple toy text world scenario. You tell the model to achieve some goal and give it a safety mechanism. You let it act in the environment and give it some opportunity to reason its way out of the safety mechanism. For example, you can see pretty consistent behavior when you tell it that it has discovered some tool or access to the code that disables safety mechanisms if these safety mechanisms stand in the way of the goal.

[-]Oliver Sourbut3y32

Right, this sounds somewhat less like un-unpluggability and more like (reasoning?) capabilities or the instrumental incorrigibility motives I pointed to at the start as a complementary insight. In particular applied to unboxing/escape - perhaps tied to expansionism (replication) of a system which is not intended to do so.

[-]Simon Lermen3y30

I would say that unpluggability kind of falls into a big set of stories where capabilities generalize further than safety. Having a "plug" is just another type of safety feature. I think it might be an alternative communications strategy to literally have a text world where the ai is told that the human can pull a plug but in the text world it can find some alternative way to power itself if it uses reasoning and planning. I am not sure if there are some people who would be convinced more by this than by your take on it.

[-]Oliver Sourbut3y10

I agree that concrete toy demonstrations are one good communication tool! I also agree that demonstrating the capability to act on unpluggability, and discussing/demonstrating the motive to do so, are also useful.

[-]Oliver Sourbut3y10

unpluggability kind of falls into a big set of stories where capabilities generalize further than safety

Interesting, I think I see what you mean. This applies for e.g. some kinds of control over active defenses (weapons, propaganda etc.) and many paths to replication. But foundationality (dependence), imperceptibility (of harmful ends), and robustness don't seem to fit this pattern, to me. They're properties which a capable system might aim towards, but not capabilities per se, and they can obviously arise through other means too (e.g. accidental or deliberate human activity).

Simply, the properties I'm pointing at here have in common that they're mechanisms of un-unpluggability. They can arise through exertion of capability, they can be appreciated by intelligent and situationally-aware systems, but they are not intrinsically tied to those. They're systemic properties which one thing has in relation to its context (i.e. an AI system could have in relation to society).

I appreciated how open-minded their questioning was - there was a genuine truth-seeking inquisitiveness, rather than a debate-minded presupposition. The people there even connected some of the dots and filled some of the gaps themselves once the conversation was unfolding, which is a great sign of ideas and knowledge moving successfully between minds. ↩︎
Less unpluggable? More un-unpluggable? I welcome terminological criticism and suggestions ↩︎ ↩︎
Analogous examples are not necessarily intended to be things we would want to unplug if we could (though many will be). Besides confirmed examples, I will also provide potential or unconfirmed examples (consensus or otherwise), which I denote with a question mark. ↩︎
Such organisations consider robustness of other systems too: in more macabre terms we will discuss the 'bus factor' of a project or team - how many people would need to get hit by a bus for key knowledge or competence to be irrecoverable? - and take deliberate steps to mitigate this, like knowledge-sharing, upskilling, and documentation (and not putting the whole team on the same bus). Nobody likes being on call when a critical component goes haywire and the only expert is sick, on vacation, or asleep on the other side of the world! I've been on both sides of that phonecall, and in each case it imparts a true and visceral appreciation for system reliability and knowledge diffusion. ↩︎
Atmospheric oxygen was not always present - its introduction due to early photosynthesis actually killed off almost all earlier life - but now it is essential to most life forms. The presence of ozone protects land-based life from deadly solar radiation - so modern life forms have developed very limited capacity to withstand such radiation. ↩︎ ↩︎
Including some credible suggestions (e.g. by US government agencies) that the ongoing coronavirus pandemic may have had an accidental lab leak as its origin, as well as more thoroughly verified cases of accidental pest introduction or pathogens finding their way out of laboratory contexts ↩︎
Though note that scalability of algorithms varies widely from 'barely scales at all, even with supercomputers' to 'trivially scales up the more compute you throw at it' ↩︎
Assuming it hasn't already taken our life or livelihood, that is ↩︎
It is my best guess for various reasons that concern for the welfare of contemporary and near-future AI systems would be misplaced, certainly regarding unplugging per se, but I caveat that nobody knows ↩︎

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

28

Un-unpluggability - can't we just unplug it?

28

28

Un-unpluggability factors

Rapidity (of gains in power)

Imperceptibility (of gains in power or of harmful ends)

Robustness (redundancy)

Aside on repair, error correction, course-correction

Dependence (collateral)

Defence (active, reactive, deterrent)

Expansionism (replication, propagation, growth)

Un-unpluggability incentives and expectations

Conclusion