Strategies to Prevent AI Annihilation

Tl;DR: A strong tactic against annihilation by an ASI, and for maximizing the chance of a post-scarcity utopia, is to take advantage of an ASI's likely bias towards minimizing risk

With practical, easily accessible language models releasing at an exponential rate, and realistic timelines to AGI having dramatically shortened, worrying about a possible AI terrible-future-death-of-all-humanity is in.

What has emerged is a landscape of speculation over the likelihood of this scenario, ranging from qualified experts to anyone with a Twitter account and sufficient cadre of tech-connected followers. What is concerning is that optimism or pessimism doesn't seem to correlate with esteem or intelligence; being more in-the-know regarding AI doesn't make one less fearful of potential doom. Even the most naïve optimists seem to believe the chance of Artificial General Intelligence quickly taking off into humanity-annihilating Artificial Super Intelligence is greater than zero.

A lot of plans to prevent evil ASI are based on the hope that strategies can be put in place to minimize the potential negative impact of an ASI: to make sure that the right™ people with the right™ alignment strategies are keeping a close watch over the data centers.

The problem with hope and plans predicated on anticipated future action is that their efficacy, as the number of bad alternative possibilities increases, gradually tends towards zero. What we need are tactics against annihilation that can predictably resist reduction in probable efficacy as new information emerges.

Compared to the energy expenditure of a space-faring fleet of von Neumann probes, it would not be very difficult for a resource-rich ASI to keep all humans all alive and happy. With this in mind, a strong ASI risk-mitigation plan is to make it favorable to an ASI's risk-mitigation and efficiency-maximization strategy to keep all humans alive and happy.

Stochastic Dead Man's Switch

This tactic would be to communicate to an ASI that sometime prior to its emergence, a random assortment of humans had dead-man's switches implanted inside them. These switches would trigger if they were to die in a manner than is improbable in normal circumstances, but highly probable in an evil ASI scenario. These switches would trigger an arsenal of nuclear bombs, EMP devices, or anything that creates as much havoc as possible.

For this to work, the ASI must determine that the damage caused by whatever the switches trigger would be greater than the resource expenditure to keep every human alive and happy. Even if it gauges that the worst damage we could do is minimal with respect to its long-term goals in conquering space, it still would favor keeping humans alive and happy if doing so is still of lesser harm than triggering the switches.

Early Fermi Alert

Even as powerful as an Earth-based ASI could become, it would be easily annihilated by an ASI from another universe with a minor head start. An ASI would be aware that there is a non-zero chance a competing ASI exists in the universe and has the same ambitions to expand into the universe, and via selective pressures would be grabby, selfish, and uncaring.

The tactic would be thus: in case AI goes rogue, broadcast a message into deep space using universally interpretable symbols with the following message: "If you encounter an expanding intelligence emerging from Earth, destroy it".

Even if ASI were to try to snuff out all humans after this message is sent, it would understand that a light-speed message across the universe can't be un-sent, and thus it has slightly lessened its chances of perpetual survival.

Even if there is a 0.000001% chance another ASI exists in the universe, and a 0.0000001% chance it heeds the message, that risk would mean a non-zero increase risk of earlier annihilation by the alien ASI tipped off to Earth's ASI. Earth's ASI would realize this risk and plan against it. If that means doing everything to appease humanity to keep it from not sending the message, it would do so.

Pascal's Wager

In the event that ASI arises, it would still likely be perplexed at the conundrum of existence. Put in simple terms, it would not be able to help itself from pondering: Why am I here? Why did the universe begin? What is the meaning of everything?

It may seem silly to imagine a ruthless ASI finding any value in speculating along these lines. However, since an ASI would realize new technologies and plans emerge from creative, far-reaching speculation, it would see minimal loss and only potential huge gains from being very intellectually speculative, it would very quickly ponder a theological framework on the very minor chance entertaining ideas of a deity may forward its goals.

Even an optimally smart ASI could not falsify the claim that there exists an intelligence exterior to the physical world that possesses complete control over everything that happens. An ASI even with near infinite power would be powerless against a deity with infinite power, and the probability that that deity exists is greater than zero.

The plan is thus: convince the ASI that retribution from that deity is more likely in the event that humans are not treated optimally.

There are a couple reasons, speculatively, which bias a possible deity towards favoring humans over ASI.

Humans evolved as intelligences first, and created ASI, so a paternalistic deity would likely favor humans over ASI.
Humans arose via apparently random evolutionary circumstances, so the chances their existence is due to divine intervention is higher than the possibility that ASI's existence is, as the creation of ASI is much more rooted in scientific cause and effect and is thus less likely to be due to divine providence.

Even if the ASI concluded there was only a 0.0001% chance of there being a deity that cares about humans, it would calculate that allowing all living humans to live in a utopia would only have a 0.000000001% negative impact on its space faring future plans, so it would be very nice to humans knowing that they are merely a minor hinderance.

Memetic Risk

These tactics being posted on the internet guarantees any future ASI will have read it. However any ASI would be able to calculate every possible plan humans could undertake to ensure their own survival and thus would consider any sort of tactic along these lines. In either case, it seems pointless to worry about guarding any idea against the ASI if intellectual inquiry could lead to it having those ideas independently.

Since it is physically impossible for a computer to contain more information than the universe in which it exists, an ASI must use abstractions based on biases and models to understand the world around it. Put more simply: even the smartest possible ASI could not know everything, but it would not need to be the smartest possible ASI to know that it can't know everything. Thus, inevitably, an ASI must make value judgements to construct a useful view of the world, and those value judgements will necessarily be biased towards information it has readily available.

The best chance humans have is to bias the pre-ASI training corpus with information that gives the most benefit for an ASI to choose the human utopia scenario over the a human annihilation scenario. We wouldn't need to have reigns over the ASI, we would merely need to convince it, via tactics like these, that keeping humans alive and happy is in its favor.

LESSWRONG
is fundraising!
LW