AI as a Civilizational Risk Part 5/6: Relationship between C-risk and X-risk

The general model presented in the previous parts suggests that the US and potentially other parts of the world will suffer some form of civilizational collapse, potentially as early as 10 years from now. In the absence of decisive action to limit the spread of narrow behavioral modification AIs, such collapse is inevitable. Social media optimizational pressure points only towards the destruction of social cohesion. The collapse of civilization is a big deal, however, this isn't the same as an even more serious subject of existential risks.

I want to discuss the relationship between civilization and existential risk, specifically the P(x-risk | unaligned AGI) by 2070. I would argue that this probability is higher than 35%. The exact quantity depends significantly on the definition of "reduction in human potential."

Foom Speed

Understanding the pathway that civilization takes towards AGI is essential in predicting the most likely safety research that might or might not happen by default. It is also essential to understand the public perception of safety. Suppose my theory of slow civilizational decline is correct. In that case, sooner or later, people need to start connecting the overuse of behavioral modification, persuasion, and scam AIs as one of the core causes of civilizational collapse.

The speed with which an AGI could improve itself is an interesting question. People have described models of "fast foom," such as foom or a rapid, second-level improvement. In this scenario, it is plausible to go from something more intelligent than humans in programming to something significantly more intelligent than humans or all other optimization in the world combined.

I tend to doubt this view. I suspect that two factors will limit Foom's speed. One is the quickness with which AI can acquire real-world feedback. If real-world feedback is vital for self-improvement, this could certainly be a human timeline improvement over weeks. The other question worth considering is: "how much of previous AI improvement was bottlenecked by humans in the first place?". Suppose we are in a situation where humans try to improve AIs by running very long and complex algorithms and then coming back several days to check on them. Then taking humans out of that loop will not necessarily speed up the process by that much because humans are a tiny portion of that loop. It is very likely, to me, that while Foom is possible, it is going to proceed at human times.

Now, this is not a cause for celebration. Just because something happens and is not instant does not at all mean that humans are capable of stopping it. It requires the presence of mind and social coordination to be significantly higher than today.

Optimism and Pessimism

Suppose you are coming over from the model of the great society, which marches upwards, unaware of itself, into an immediate doom. In that case, my model has a few key differences, some of which are reasons for optimism and others for further pessimism.

The main reason for optimism is that astute observers are likely to have time to understand that bad things are happening. The highly questionable "good news" is that the economic destruction of large parts of civilization could potentially remove the funding from many destructive AI projects. Society would have to fall pretty far for this to be the case, but it is possible. If the West collapses, other countries with different cultures may learn from its mistakes. It sounds dystopian to say that a civilizational risk could reduce existential risk. However, I am not saying that civilizational risk is a net positive. Instead, we should work on solving the smaller and the big problems, realizing that the smaller problems are frequently an instance of the big meta-problem.

There are also reasons for pessimism. COVID has shown that civilizational capacity is declining overall, partly because of social media's drive to promote not the best ideas to the top of everyone's newsfeeds. As civilizational capacity declines, we can no longer learn from our mistakes. Individuals can learn from mistakes, but for civilization to learn from mistakes, individuals with proper solutions to problems need to be recognized, and mistakes ought to lead to a loss of decision-making capacity. However, as narrow AI takes over the discourse, it pollutes the signal space to connect the proper signals of popularity, availability, and trustworthiness to the capacity to solve problems.

Once again, I can give the example of the desire to give the EcoHealth alliance more funding for coronavirus "research" as an example. The capacity to learn, especially at the meta-level, needs to be improved, and it is easy to misidentify factors that have led to a civilization collapse if it happens. Failure to properly analyze factors is present in the West and other civilizations. If the West collapses, China might say that "they have collapsed because they were not communist or were too liberal, or did not follow the Chinese way, the Chinese philosophy." This belief is unlikely to give them the proper push at the meta-level to block their AI development and make it safer or slow down.

Another reason for pessimism is that most AI safety development is in the West. The collapse of the West may also mean those safety efforts may slow unless the current researchers are willing to move to other countries, which are a priori likely to listen to them even less than the current government. Furthermore, certain "AI ethics" efforts, such as blocking chatbots from saying right-wing views, may be perceived as associated with the Western political order. Given this perception, people on the right and in other civilizations who oppose Western political ideals may falsely view AI as an "enemy of my enemy" and also view it as somehow capable of being an "objective" arbiter of reality.

Again, this is false. As many people have pointed out, AI is a vast category of algorithms, and some of these algorithms will be as different from each other as they are from humans. It is challenging to create a generalization on all AIs, especially positive generalizations. It is easier to make correct negative generalizations about them being unaligned or unsafe since the space of possible utility functions is vast, and most do not correspond to human values. A widespread perception of AI as "objective" is a cause for concern in a post-West world.

The bottom line is that AGI has to be designed by a group with high social cohesion. Otherwise, the group is likely to create internal coordination points which are anti-correlated with human utility. Many nations might have higher cohesion than the West. However, social cohesion of other nations is not high enough to make them immune from this problem. We must counteract and defend against narrow AIs lowering social cohesion to ensure such a group can exist.

Probability estimates of “reduction of future potential”

What, specifically, is the probability of x-risk that I would give for 2070? The probability conditioned on unaligned AGI of human extinction or a significant reduction of future potential. This question primarily depends on a precise definition of a significant reduction of future potential. We are talking about an expectation that we have about what humans are capable of achieving if we were either capable of aligning an AGI or able to move forward with civilization without using AGI at all.

Assuming any reasonable definition, I would argue that this probability will be higher than 35%. I would argue it is somewhere between 50% and 100%. The reason for 100% is an argument that "a large reduction of future potential" has already happened if we consider a past counterfactual.

Definition Of “reduction of future potential”	My probability of P(extinction or reduction of future potential \| unaligned AGI) by 2070
Billions of people with mental health problems	100%
End of US and western civilization’s inability to affect the future	92%
Global economic stagnation	80%
Global dystopian lock-in	75%

As I mentioned in part 1, social media could have taken a different path and avoided causing mental health problems. If we take this as a counterfactual possibility, we already are in the category of a significant reduction of future potential today, in 2022. This possibility does not involve an AGI but rather narrow AIs. Suppose we define "a large reduction of future potential" as the loss of mental health of billions of people over a decade. In that case, the probability in the question is 100% by default. What should the probability be if a "significant reduction of human potential" means something larger than that?

There are many questions of what historical events would qualify as a "significant reduction of future potential"? One controversial example is the Black Death. Was the scale of death and devastation caused by the Black Death massive? Of course. It would have been a traumatic event that reverberated for centuries. However, how much future potential did it destroy is a big question. In the grand scheme of things, if all goes well with humanity, and if we looked backward after millions of years, we might sense the bleep left by the Black Death has been somewhat insignificant.

If Western civilization falls apart and cannot shape humanity's future, likely leaving that question to the global East/South, does that qualify? As I have argued prior, the fall of western civilization is highly likely due to drops in social cohesion even without the need for AGI. Why such high confidence? The issue is that there are currently no algorithmic pressures to increase the core civilizational variable of social cohesion. At the same time, there are algorithmic pressures to find wedge issues to decrease it. Even highly ambitious socio-algorithmic projects, such as a network state, can be seen as a way to move a socially cohesive group outside the sphere of influence of the US (or, as the book puts it, the NYT). Even this project, ambitious as it is, does not plan to increase social cohesion of the US as a whole. Finding algorithmic ways to increase social cohesion, or at least not decrease it is necessary, though more is needed for a fully aligned AGI. The absence of AGI alignment in 2070 is pretty strong evidence that social cohesion has reached the "collapse point."

What kinds of events would be a worse thing for humanity than the Black Death or the end of western civilization while not being extinction events? I predict that AGI can create "lock-ins" where it ushers a dystopian future. A lock-in is a future where an AGI keeps optimizing a metric it was initially designed for and manages to avoid causing an extinction since the metric depends on people being alive to do something. However, this would involve taking its utility to a logical conclusion. Bostrom and other thinkers have described these types of scenarios.

One potential example is the complete digital addiction of all people to the future analogy of social media. You can imagine humanity spends all time, from womb to tomb, staring at the same screen, yelling at each other, while living for a few years, like 35. Actual progress in anything health-related stops, and barely enough functional pieces are left to maintain a civilization. This dystopia could go on for thousands of years. Would this disturbing outcome qualify as a significant reduction of future potential, even if humanity does get out of it at some point? I will say yes. Suppose AI begins to specifically target people who are intelligent enough to oppose it and kill them. In that case, AI could select many positive traits from the population, destroying its capacity to resist or reinvent civilizations. The scale of a behavioral modification of AI's Dark age could be vast. Hundred-thousand-year lock-ins into bad algorithmic and societal architecture can drastically change the speed at which humanity progresses, even if it gets out of them. These kinds of lock-ins are much more likely to exist as the extreme versions of the problems we see with behavioral modification today.

I put a high probability that an AI or a group of AIs designed explicitly for propaganda is the one that achieves a decisive advantage. After all, it can mobilize people to its cause and rally against potential competing AGI projects or rally people towards wars against other great powers to prevent them from building AGIs. If we extend the line of today's problems down to its logical conclusion, the dystopian possibility is significant. Given that we will likely be in multi-polar scenarios of multiple behavioral modification AIs competing against each other, they can block each other's excesses in terms of killing people. Modifying the behavior of humans is challenging if there are no humans. Human extinction is still plausible if wave after wave of lab leaks of AI-created biological weapons get unleashed. However, I put the % of human extinction at ~15%, and the most probability mass (35-85%) lies in the significant reduction of future human potential.

All Parts:

P1: Historical Priors

P2: Behavioral Modification

P3: Anti-economy and Signal Pollution

P4: Bioweapons and Philosophy of Modification

P5: X-risk vs. C-risk

P6: What Can Be Done