There are no groundbreaking or revolutionary concepts presented, it's likely that this material is all review for the average LWer. I would appreciate feedback on how effectively this content resonates with a relatively new audience.

Epistemic Status: I am an undergraduate student at Dartmouth majoring in Math and Philosophy, not an AI researcher. While I have a deep interest in these topics, I am certainly not an expert.

Preamble

Blake Richards, Blaise Agüera y Arcas, Guillaume Lajoie, and Dhanya Sridhar recently added their voices to the chorus of people who believe that AI existential risk is far overblown. Their article begins with the typical rhetorical strategy of the anti-alarmists, where they claim that focusing on long-term, theoretical risks distracts from the actual harms of AI systems:

But focusing on the possibility of a rogue superintelligence killing off the human species may itself be harmful. It could distract regulators, the public and other AI researchers from work that mitigates more pressing risks, such as mass surveillance, disinformation and manipulation, military misuse of AI and the inadequacy of our current economic paradigm in a world where AI plays an increasingly prominent role. Refocusing on these present concerns can align the goals of multiple stakeholders and serve to contravene longer-term existential risks.

This is decidedly unconvincing. It creates an unnecessary opposition between two issues that could be addressed simultaneously. The study of long-term risks contributes to our understanding of short-term concerns.

For example, research on interpretability aims to comprehend the inner workings of AI systems. If models like GPT were fully interpretable, there would be fewer questions about their capabilities and limitations. As these systems stand now, they are composed of giant indecipherable matrices of floating point numbers. When first developed, GPT's full capabilities were unknown because the model was inscrutable. Its capabilities were gradually discovered through interactive engagement and observing its outputs.

While this lack of understanding raises existential concerns for the long term, it is also closely tied to immediate fears around disinformation. Our inability to fully comprehend the AI systems we develop seriously hinders our capacity to curb their negative effects. We can’t yet fully prevent LLMs from hallucinating or spreading misinformation, which may be an inherent consequence of auto-regressive models. However, if these systems were interpretable, we could better grapple with this issue.

Furthermore, existential risk research focuses on developing aligned, human-centric superintelligent systems. This includes ensuring our systems don't spread misinformation, create false narratives, or aid malicious actors. These short-term concerns are part of the larger conversation. The work of those focusing on immediate AI safety isn't identical to that of existential risk researchers, but it's complementary, not contradictory.

The argument that existential risks receive undue attention relies on viewing human extinction by AI as extremely unlikely. Without this belief, the argument parallels saying, "Focusing on the long-term risks of climate change is nonsensical because it diverts attention from immediate concerns like tomorrow's heatwave." So, with this in mind, let’s try to understand why Richards et al view these risks as so unlikely. To their credit, they do try to answer this question.

The argument:

Respected AI blogger Zvi Mowshowitz made a list of cruxes, points where one could find disagreement with particular aspects of the AI alignment issue. There are many reasonable cruxes, as people come at this issue with wildly different intuitions. Richards et al combat the AI alignment concern on four fronts:

We can create superintelligence
Throughout the arc of global history, the more intelligent species WINS
In a resource competition between AI and humans, AI wins
AI can wipe us out with physical weapons

Let’s dive into each.

Is superintelligence possible?

Philosophically, the answer seems clearly to be yes. There is no reason why intelligence would asymptotically cap out at the human level. It is not difficult to conceive of systems much smarter than us both in terms of processing speed and depth of conceptual insight. I could belabor this point further, but this seems undeniable. The authors, though, reject this framing altogether, seeing intelligence as a cluster of phenomena too broad to be labeled under a single term.

Even defining “superintelligence” is a fraught exercise, since the idea that human intelligence can be fully quantified in terms of performance on a suite of tasks seems overly reductive; there are many different forms of intelligence after all.

This isn’t so much wrong as it is unproductive to state. Of course, many distinct phenomena can be labeled under the umbrella of intelligence. And maybe this might cause one to reject single number metrics like IQ. But in the limit, there is something qualitatively different about a human mind versus a mouse mind. Humans have the ability to traverse and understand causal relationships far beyond that of a mouse – an ability to examine phenomena and understand their inner workings. We can think about the goal of “going to space” and work backward through causal space1 to determine how to get there.

The other half of the question is whether the current paradigm or anything remotely resembling the current paradigm of machine learning will achieve superintelligence - this is a technical question. It seems that most ML experts believe we can achieve superintelligence. Most of the leaders of major AI labs (Amodei, Sutskever/Altman, Hassabis) think superintelligence is possible. And when surveyed on the issue in 2016, only 16.5% of experts had 90% confidence that we will never achieve superintelligence.2

Is Intelligence King?

Yes, in the limit. Here is an example of the kind of argument that Richards et al is responding to.

Let's contemplate the following scenario: if rhinos were asked to evaluate how rhino-aligned humans are, they would rate us unfavorably. Despite being physically inferior to rhinos, humans have driven rhinos to the brink of extinction. This isn't due to any specific malice towards rhinos but instead a side effect of our varying objectives, which occasionally conflict with the interests of rhino survival. For instance, early humans thought rhino horns would help with sexual virility and medicinal purposes, leading to extensive poaching that decimated rhino populations over time.

Our advantage lies in one key trait: intelligence. It allows us to create elaborate tools to hunt rhinos, devise systems to track their movements, and engineer vehicles to follow them. The significant disparity in intelligence is crucial. If humans were only slightly more intelligent than rhinos – by 5%, 100%, or even 500% – we likely wouldn't be capable of pushing them toward extinction. With this line of thought, let's turn to the Richards et al's viewpoint.

It is true that in Earth’s history, there are examples of one species causing the extinction of another, less intelligent species; extinctions caused by humans are most often cited. (We are, in fact, unaware of any nonhuman example.)

Indeed, this is generally accurate. Different species possess varied strengths and weaknesses, which often result in a balance within the ecosystem. Indeed, I think the writers are right in saying:

More broadly, interspecies extinction is not a result of some competitive battle for dominance between two species. The idea of species forming a hierarchy or “Great Chain of Being” is inaccurate; in reality, relationships between species are complex and form a web or graph of mutual interdependence with no “top” or “bottom.” When biologists talk about “dominance” in animal interactions, they usually apply definitions that focus on relationships between individuals of the same species.

Occasionally, however, one species might have a resource or capability that tips this balance. For instance, the invasive cane toad's poisonous skin proved fatal to many predators unequipped to handle it. In the Late Devonian, new species of fish overhunted prey, wiping out many other species. The balance isn’t a rule; it’s a general descriptor for the processes of nature. So, the authors then conclude:

However, superior intelligence is not the key determinant in such events; there have been many instances of less intelligent species causing the extinction of more intelligent ones. For example, in the Late Devonian, the rapid diversification of plants and the changes to the atmosphere that they induced is believed to have been a cause of one of Earth’s mass extinctions, resulting in the loss of three-quarters of all species, many of which were likely more intelligent than plants.

Superior intelligence is not always the deciding factor in the survival of species, typically because the intelligence level is too negligible or the gap is not large enough to outweigh other defining advantages. However, humanity strongly deviates from this trend. Our intelligence, relative to other animal species, can be classified as superintelligence. It's this cognitive prowess that positions all other species, except for some symbiotic organisms we depend upon, at our mercy.

I am not suggesting that intelligence directly leads to the extinction of other species. A weaponless human trapped in a cage with a ravenous lion would undoubtedly meet a gruesome fate. However, intelligence does provide us with tools to accomplish our goals far more effectively than any other species on Earth. Intelligence is a powerful asset that gives humans dominance over the global ecosystem. If we were to lose that dominance, there could be dire consequences. Consider the fate of the grizzly bears that once roamed the streets of Los Angeles after humans gained dominance over their ecosystem.

The authors' assertion that intelligence is not the primary determinant of species extinction does hold true over millions of years of global history. However, the past 100,000 years have revealed this is not an absolute truth. The authors may not dispute this, though I suspect they would. Their argument seems to suggest that it's not an inevitable outcome of evolution for a vastly more intelligent species to outcompete the less intelligent ones. While this may not be an absolute certainty, there are factors indicating it is a distinct possibility. This brings us to the authors' next point.

The argument made is that we've driven species like Californian grizzly bears to extinction because they posed a threat to us and competed for resources. The same, they argue, would not hold true for AI.

Would AI compete with humans for resources?

Richards et al think no. In their view, there is no necessary competitive rivalry for resources. We live in a world with finite resources. Humans have modified billions of acres of landmass and aspire to modify more. Humans configure the atmosphere in ways that might not be optimal for an alternate intelligence. But the authors say:

AI is not competing for resources with human beings. Rather, we provide AI systems with their resources, from energy and raw materials to computer chips and network infrastructure. Without human inputs, AI systems are incapable of maintaining themselves.

This is true...for now! While humans control production pipelines and resource allocation, AI is subjugated to human symbiosis. But there are potential competitive pressures down the line. An advanced AI might want to utilize the planet's resources in ways incompatible with human civilization - harnessing all land for solar panels, harvesting the biosphere for compute, or launching massive geoengineering projects without regard for consequences.

The authors acknowledge there may come a time when AI is capable of fully automating resource extraction and production:

If mining, global shipping, and trade of precious metals, building and maintenance of power plants, chip-building factories, data center construction, and internet cable-laying were all fully automated — including all of the logistics and supply chains involved — then perhaps a superintelligent AI could decide that humans are superfluous or a drain on resources, and decide to kill us.

I agree we seem far from AI having the capability to become entirely self-sustaining. But it isn't impossible in the long run, and there are inherent competitive pressures for an AI to gain unilateral control. Furthermore, nations have incentives to grant AIs increasing authority over infrastructure to gain economic and military advantages over peers. We may not know the timelines, but whether it takes 15 years or 150 years, the threat seems real. An entrenched superintelligence could shape the world around its preferences, not ours. The fact that humans may remain necessary in the medium term does not negate the risk. We would still have ceded control of the future to superior "overlords" who keep us around merely as resources.

Can AI wipe us out with weapons?

The authors seem to think that attempt for AI to wipe us out with nuclear weapons or bioweapons could be detected by humans, and we could stop it! I think this is extremely wishful thinking. I’ve taken some inspiration from Zvi’s blog: A Hypothetical Takeover Scenario Twitter Poll which covers this question in far more detail. If we are asking this question, we are assuming two very dangerous things:

There exists a Superintelligent AI which has practically every cognitive capacity possessed by humans. It is able to interact with humans and potentially copy itself.

The AI prefers some version of the universe without humans in it.

Given those premises, humanity is vulnerable. Such an AI would comprehend physics, biology, chemistry, etc., profoundly better than humans and out plan us by orders of magnitude. Through meticulous social engineering, it could persuade us to destroy ourselves through war. Or, by pioneering novel nanotech or biotech techniques beyond our understanding, it could directly develop unforeseen doomsday weapons.

Even without speculative sci-fi scenarios, however, the AI still holds the advantage, just as we hold an advantage over our animal counterparts. As long as the superintelligent AI can interface with us meaningfully, it poses an existential danger. This system would be far more intelligent than any human and capable of deceiving us. It could patiently execute a long-term plan, slowly positioning pieces over years without raising alarms.

How could we discern the AI's true motives from its behavior? Like an employee feigning dedication to a job while solely seeking a paycheck, it could convincingly pretend concern for humanity while working against us covertly. It might gather resources or hire human proxies by promising medical advances.

Once uncontrolled, a superintelligent AI has no inherent constraints. While I cannot conceive a 1000-step plan for how it could destroy us, I do not presume it is impossible. Humanity is fragile; the AI likely needs only a single unnoticed vulnerability to trigger a catastrophe. Given this, we are ill-equipped to stop a superintelligent AI determined to defeat us. Our only hope is to prevent its creation.

Are AI Doomers engaging in Pascal’s Wager-style reasoning?

This question is akin to an AI version of Pascal’s wager: the potential consequences of not believing in God are so bad — eternal damnation — that it’s just more rational to believe in God regardless of whether God really exists. Pascal’s wager ignores the fact that we should be considering probabilities in addition to potential outcomes. Yes, going to hell with rogue-AI-induced extinction is terrible, but if it is a very unlikely outcome, then it could be worse to focus our efforts on preparing for it if that leads us to make choices we otherwise wouldn’t.

While I have certain disagreements with the treatment of Pascal's Wager here — I believe the decision theory involved is far more complex than what's being presented — that's a discussion for another time. The main issue is the flawed premise. The authors haven't convincingly demonstrated that the probability of an AI-induced disaster is low.

Conclusion

AI risk is a complicated topic. There are many places where one can dig their heels in the sand and jump off the doom train. Zvi’s list of cruxes demonstrates this point. I don’t think the cruxes identified in the article are valid reasons to be unconcerned about AI risk. One has to wonder — if these arguments don’t stand to scrutiny, it may be time to start worrying about an AI takeover.

1 Some key points on casual space:

It captures the interconnectedness of things - how changing one element can propagate effects through a system.
It allows reasoning about interventions - e.g. if I do X, how might it end up influencing Y downstream?
Mapping causal relationships allows planning and problem-solving to achieve goals. For example, if my goal is to build a rocket to go to space, I need to understand what materials, forces, designs, etc., are needed and how they are causally related.
Human minds have a rich mental model of causal space in many domains, allowing complex reasoning, inference, and planning. This is a key aspect of general intelligence.
Modern AI systems have limited representations of causal relationships, restricting their reasoning abilities compared to humans. Building richer causal models is an active area of AI research.
Causal space is essentially the "playing field" on which intelligence operates. Greater intelligence involves being able to deeply comprehend and nimbly navigate this web of causal influences. Humans have a major advantage over other animals in this regard.

2 Müller, V. C., & Bostrom, N. (2016). Future progress in artificial intelligence: A survey of expert opinion. In Fundamental issues of artificial intelligence (pp. 555–572). Springer, Cham.

LESSWRONG
LW

LESSWRONG
LW

1

A response to the Richards et al.'s "The Illusion of AI's Existential Risk"

1

1

Preamble

The argument:

Conclusion