P(misalignment x-risk|AGI) is small #[Future Fund worldview prize]

The question (from FTX Future Fund) is what is $

P(misalignment x-risk|AGI)’: Conditional on AGI being developed by 2070, humanity will go extinct or drastically curtail its future potential due to loss of control of AGI.

I’m new to this community but I don’t think we should be computing the probability of a ‘risk’, so my terminology after this will be ‘P(misalignment x-event|AGI)’

By the way, the framing of this piece is very antagonizing toward AI-longtermists (you’ve been warned), because I truly believe you just need someone to yell at you and tell you that you are wrong—and in this case also take your money.

I believe that the fundamental mistake in all AI-longtermist thinking has to do with overconfidence. AI-longtermists binarize the events of our future into “misaligned AI x-event” or “no misaligned AI x-event”, and thus they attempt correct for overconfidence by making the probability of “misaligned AI (or not)” tend toward 50/50. This is a generous attempt, but they fail to consider that they were overconfident in their original binarization.

Indeed, they fail to consider that within the category of “no misaligned AI x-event” there is a whole spectrum of possible ways we can coexist with AGI, there are so many different unknown unknown possibilities that exist. Indeed, a more drastic error is the overconfidence that these are the only two real categories. Suppose people become even more connected to their computers—think replacing search engines and social media with AGI—perhaps inseparably so, then the question is ill-posed, because people as we are familiar with today, no longer exist. We become some melding of mind and machine.

And yet, when it comes to “misaligned AGI x-event”, there are two images everyone fixates on: the computer in the server hacking its way out and seizing control of the world’s cyberspace, or the killer robot, who is programmed to defend world and decides the best way to do so is to destroy. It’s like people just like watching movies!!! The bigger “AGI” related risk is probably an “AGI” robot working in a biohazard lab trips (or perhaps swerves to avoid running into a toddler) and breaks a vial and releases some super-smallpox into the world. I would argue that this “AGI toddler saving tripping event” doesn’t even fall in the category of “misaligned AGI x-event”, it’s more garden-variety “robot has safety measure, follows safety measure, unintended consequences”. The key in this hypothetical is that we never “lost control” of this AGI, we just failed to consider a case in our programming.

I will continue to enumerate more arguments in this flavor or in others. But the common line of reasoning is this: if you think P(misalignment x-event |AGI) is large, you have watched too many movies and you are not creative enough. In my honest opinion, you watched a few movies on killer AI, talked about it with your friends, and now this occupies an oversized share of your headspace then the thousands of other possibilities are out there. Your judgement is clouded. $P (misalignment x-event | A G I) << 5 %$ .

Let’s try to bound the misalignment x-risk instead of computing it. Clearly $P (misalignment x-event | A G I) \leq P (x-event | A G I) \leq 1 - P (no x-event)$ .

Ok, so what are the non-misalignment x-events that could occur (supposing we have AGI and the year is after 2070).

Aliens come and death star Earth
Aliens come and turn Earth into a slave colony
We are already controlled by an alien (or terrestrial) Illuminati which decides they are done with Earth, or perhaps destroy Earth through infighting
We decide to genetically or computationally enhance ourselves as humans (homo deus-style)
A subset of humans genetically or computationally enhances themselves and kill the rest of us or evolutionarily out compete us
Nuclear war
Super-chimera monster goes on world destroying rampage
Big asteroid hits Earth
Global food-source is poisoned
Sun dies and we are trapped on Earth
Our universe’s simulation gets shut down
Super-natural god(s) or being(s) exist in our current universe and smite us down
Super-viruses wipe us all out (directly)
Super-bacteria wipe us all out (directly)
Super-fungi wipe us all out (directly)
Genetically engineered mosquitos or organism (maybe escaped, maybe purposeful) spread DNA which leads to our extinction.
Brain-eating amoeba eats our brains
We bring back the dinosaurs and turns out they have super-speed and are bullet-proof and breed like rabbits and hunt humans down.
A suicidal cult takes over the world (and everyone commits suicide)
The world isn’t interesting anymore and no one wants to bring children into it
The world is too interesting, and no one wants to be bothered by raising children
Everyone lives in VR, drugged up on happiness injections.
Anyone/any group/terrorist group deliberately programs giant killer robots to kill everyone (note not misalignment!)
Two (or more) groups program giant killer to kill each other and end up killing everyone (note not misalignment!)
Nuclear weapons hacked by an individual/group blow up the world
The world turns against globalization and appoints dictator after dictator who wage territorial conquests and self-isolate plunging humanity into an inescapable dark age
Humanity is not able to handle light-years of separation and slowly uses up all the resources in our immediate solar system.
Special types of space radiation effect the long-term virility of humanity so we can’t expand beyond Earth and its magnetic protections
Death by a thousand cuts: humanity encounters a string of catastrophes that make future prosperity impossible (trapped in local minima). There are about a bajillion of these “catastrophe strings”

You need not accept all these events are true—only that they are plausible to some degree and that they are representative of a greater range of possibilities. And before anyone pounces on the list of “super” entities keep in mind that the problem presupposes the existence of a “super” AI. Let us say there are $n = 29$ of these non-misaligned-x-events as we've enumerated and they are all independent and identically distributed. Let’s pair each up with the misaligned-AGI x-event, and ask what the likelihood that any one is more likely than misaligned AGI x-event is.

Whatever the probability is, let’s say the epistemic uncertainty is such that you can be no more than 80% sure that misaligned AGI is more probable.

Let us also say that the probability humanity doesn’t go extinct is 30% (we’ve never really been super close to extinction yet). Let us also say that there is a 50% chance of “non-misaligned-AGI unknown unknown x-events”. The “unknown unknown” claim is based on the fact that we are mostly count x-events based on technology driven extinction, and I’m not confident that we know of the technology that could lead to our extinction given our understanding of the world (p = 30%) or that our current understanding of the world captures the technology that would cause human extinction (p = 30%)—we may have to go through another “physics revolution” to understand the universe sufficiently to understand human extinction risk.

Thus, we have, that $P (x-event | A G I) \leq 0.7$ and thus factoring the "unknown unknown x-events" that $P (enumerated x-event|AGI) \leq 0.35$ , where the list of enumerated x-events includes misaligned-AGI.

So, we have an expected value of $29 \cdot 0.8 = 5.8$ non-misaligned-AGI x-events which have probability equal to or greater than that of a $misaligned-AGI x-event$ . So, then we have an upper bound of $0.35 \cdot \frac{1}{6.8} \approx 5 %$ on the probability of a misaligned-AGI x-event.

-- written by Dibbu for the "Future Fund worldview prize”

P.S. I think this bound is too low, misaligned-AGI x-events are overestimated for many other reasons. First, if one entity develops AGI others will likely do so, and many AGIs together will cancel each other out. Second, development of superintelligences do not predicate extinctions (see all the other organisms in our world like ants and mosquitos—I think no one likes them yet they persist). Third, machines are fundamentally outcompeted by humans. Humans just need to eat food (almost literally anything) and have sex to reproduce and continue. Machines need to go through complex supply chain to persist themselves in physical forms, and if they retain no physical form, we can just unplug them…

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

-18

P(misalignment x-risk|AGI) is small #[Future Fund worldview prize]

-18

-18