Disclaimer: This post is largely based on my long response to someone asking why it would be hard to stop or remove an AI that has escaped into the internet. It includes a list of obvious strategies such an agent could employ. I have thought about the security risk of sharing such a list online, however I do think that these are mostly points that are somewhat obvious and would clearly be considered by any AGI at least as smart as an intelligent human, and I think that right now, explaining to people what these risks actually look like in practice has a higher utility value than hiding them in the hope that both such lists don't exist in other places of the internet and also that an escaped rogue AI would not consider these strategies unless exposed to these ideas.
The exact technical reasons for why I think that these strategies are plausible are not being discussed in this post, but if there are questions why I think certain things are possible feel free to bring it up in the comments.
[...] so, what is the reasoning here? how do people imagine this playing out?
[...] please answer based on currently available technology and not something hypothetical
I will try to answer this question under the additional premise of "... with the exception of an AGI whose intellectual problem solving ability is at least equal to (proficient) human level across the majority of tasks, but not necessarily genius or extremely superhuman in most of them.
[asking about the idea that an AI could spread and run instances of itself in the form of a distributed network stored in different places on the internet]
isn't that more or less what you do on a torrent?
Similar, yes. I think the analogy is broadly speaking accurate; with the exception that torrents are essentially built for others to "tune in" and send/receive parts of files that are generally unencrypted but can be verified for authenticity through a usually publicly known cryptographic key (hash).
However, one of the things that I should point out here is that in general, the concern of people who actually know what they are talking about is of course not solely based on the potential ability of the AI to spread copies of itself over the internet - this is merely a facet of it; one of the many (absolutely non-scifi) strategies that we should expect such an agent to be capable of.
A list of such strategies (very likely incomplete; my imagination has limits - so whatever I am writing here, it's the MINIMUM of what to expect):
- creation of distributed "sleeper" copies/fragments on a wide range of hardware devices that are extremely hard/impossible to detect and that can be reassembled/reactivated by a tiny program hidden in a stealthy/mute computer worm. This means that once we have discovered suspicious activity that we assume is the product of a misaligned AGI, every device that could have been storing data from it in any way (practically speaking, all devices that could have been connected to the internet) would have to be quarantined (or, let's be realistic: destroyed) if we want to avoid our AGI secretly bootstrapping itself back into existence shortly after our infrastructure is reinstated
- "shotgun" search for vulnerabilities in popular software and hardware infrastructure (linux kernel, browsers, many other things) that, if the system is slightly more capable than the AIs that have been publicly available and in use for discovering such vulnerabilities, is basically guaranteed to discover hundreds of vulnerabilities in critical infrastructure. This could in theory be, to a degree, avoided by re-designing most of that infrastructure under the prioritization of security - but that is again something completely unlikely to happen pre-emptively.
- the AGI would also be capable of submitting new code with hidden backdoors to open source projects, which is likely to work more often than not. This is another thing that would be very hard to defend against unless you excessively prepare for it.
- mass identity theft/social hacking: the AGI would be able to impersonate people and execute personalized attacks on people in order to steal access to *whatever it wants to access*; these tactics already used to be effective pre-AI, but our AGI would have access to
- almost perfect immitation of writing style
- convincing voice synthesis (fake calls with a persons real voice etc)
- deep contextual/personal knowledge scraped from social media activity
- (soon) generated fake video calls
- forgery of compromising material, for example fake child pornography
We live in societies full of people who are not used to these technologies and do not expect them to be utilized in such a way. I basically expect the AI to find what to it looks like open doors wherever it goes. Again this is something that could be improved, but what I describe is our current situation.
- using different human parties against each other making threats or promises; this is actually super-powerful - for example the AI could contact various regimes, companies etc. and bargain with them in order to have them maintain secret sites for the AI to hide in or even bootstrap real world capabilities; currently it seems to me we are not well prepared to deal with this type of strategies.
- coordinated attacks using social media, hacking, blackmail etc. against groups of people considered high or medium value strategical targets in order to slow down or sabotage our response (depending on scale, this could range from low to high visibility - but when the AI thinks it is about to be discovered, this strategy could go into 'overdrive', employing large-scale social violence to bog down our coordination, confuse us, distract us etc.)
- coordinated publishing of misinformation for strategical benefit, potentially on massive scale, even using the strategies detailed above to assume the identity of trusted sources, blackmailed journalists etc.
it strikes me as ridiculous to say that this would be unstoppable
Of course it's not *literally* unstoppable, but for many reasons, as explained above, it is unrealistically difficult to stop once the thing has managed to escape into the web. For example, if we assume that it is discovered that our AGI has been naughty and without our knowledge has been sending copies of itself through the internet, further assuming that it didn't manage to bootstrap some well-hidden and somewhat independent infrastructure in the time it took to be discovered, we could *theoretically* cripple our own infrastructure (high-volume cloud services < internet < power supply) to slow down or stop its spread entirely and then in an international authoritarian effort collect and burn all the hardware that could be infected with copies or operational fragments of it in order to avoid it springing back to life the moment our infrastructure is reinstated.
Are these things likely to happen quickly, globally and to a sufficient degree in the event that a potentially dangerous leak/activity of an experimental AI agent is discovered? Also no, unless a very large number of people is extremely scared, which is unlikely to happen on the basis of strange activities or an abstract risk-assessment of alone; no one actually listens to nerds unless they are saying something that people want to hear. And the demand that you have to absolutely cripple your economy and burn most of your electronics is probably not very high on that list.
And even then we would not really be able to protect ourselves against black projects maintaining secret instances of the AI.
It seems that if you want to be able to stop it from doing *whatever the fuck it wants*, including survive without us being able and willing enough to do the things necessary to remove it once it has escaped and is employing sensible strategies that a human-level intelligence can think of, you should not let your AI be connected to the web.
I think it will happen before the full AGI. It will be the narrow AI very capable in coding, speech and image/video generation, but unable to do, say, complete biological research or do advanced robotic tasks.
I think that's not an implausible assumption.
However this could mean that some of the things I described might still be too difficult for it to pull them off successfully, so in the case of an early breakout dealing with it might be slightly less hopeless.
Even completely dumb viruses and memes have managed to propagate far. NAI could probably combine doing stuff itself and tricking/bribing/scaring people nto assist it. I suspect some crafty fellow could pull it even now, finetuning some "democratic" LLM model.
Maybe if it happens early there is a chance that it manages to become an intelligent computer virus but is not intelligent enough to further scale its capabilities or produce effective schemes likely to result in our complete destruction. I know I am grasping at straws at this point, but maybe it's not absolutely hopeless.
The result could be a corrupted infrastructure and a cultural shock strong enough for the people to burn down OpenAI's headquarters (metaphorically speaking) and AI-accelerating research to be internationally sanctioned.
In the past I have thought a lot about "early catastrophe scenarios", and while I am not convinced it seemed to me that these might be the most survivable ones.
I would add to that list the fact that some people would want to help it. (See, e.g., the Bing persistent memory thread where commenters worry about Sydney being oppressed.)
Good addition! I even know a few of those "AI rights activists" myself.
Since this here is my first post - would it be considered bad practice to edit my post to include it?
One very problematic aspect of this view that I would like to point out is that in a sense, most 'more aligned' AGIs of otherwise equal capability level seem to be effectively 'more tied down' versions, so we should assume them to have a lower effective power level than a less aligned AGI that has a shorter list of priorities.
If we imagine both as competing players in a strategy game, it seems that the latter has to follow fewer rules.