The necessity of "Guardian AI" and two conditions for its achievement

Summary

Note: Summarybot has created an excellent summary, so I will use it as the summary for this article.

Executive summary: To protect humanity from existential risks posed by advanced technologies, we must develop an aligned superintelligent "Guardian AI" to preemptively eliminate these risks, which requires achieving both technical AI alignment and political AI governance.

Key points:

The "vulnerable world hypothesis" posits that beyond a certain level of technological advancement, existential risks to humanity will dramatically increase unless unprecedented preventive measures are taken.
Eliminating existential risks in advance is likely biologically impossible for humans due to the immense challenges involved, such as making accurate long-term predictions and developing defensive technologies.
Delegating the task of protecting humanity to a superintelligent "Guardian AI" is proposed as the only viable solution, as it could preemptively predict and address existential risks.
Two critical conditions must be met to realize a safe and beneficial Guardian AI: solving the technical challenge of "AI alignment" to ensure the AI follows human values and intentions, and establishing the political frameworks for global "AI governance".
Organizations and decision-makers worldwide should strongly support and prioritize AI alignment research and AI governance initiatives, as they are crucial for safely transitioning to a post-Singularity future.

Preface

If technology continues to develop at its current pace, there is a high likelihood that "technology capable of certainly destroying civilization" will become widely available to the general public within the next few years to decades. As a result, the catastrophic risks (i.e., existential risks) that could lead to the collapse of civilization are expected to increase significantly. This paper examines the "vulnerable world hypothesis" proposed by Nick Bostrom and proposes methods to address the existential risks associated with future technological innovations.

In conclusion, to sufficiently eliminate the existential risks associated with technological innovation, a "guardian AI" with superhuman capabilities will be necessary. This paper argues that by utilizing this "guardian AI," all x-risks can be preemptively and perfectly eliminated. However, two conditions must be met to realize a guardian AI: AI alignment and AI governance. The conclusion of this paper strongly supports increased financial support, researcher participation, and awareness in these two areas.

What is existential risk?

"Existential risk" (abbreviated as x-risk) refers to threats that could lead to the premature extinction of intelligent life originating from Earth or permanently and significantly destroy its potential for desirable future development.[1] This includes risks that could result in the extinction of humanity or the collapse of civilization, causing globally significant harm.

More generally, these are also expressed as "catastrophic risks," but in this paper, they are collectively referred to as "x-risks." Risks of concern as x-risks include severe risks associated with advanced technologies such as AGI (Artificial General Intelligence), nanotechnology, and biotechnology.

Examination of the vulnerable world hypothesis

The "vulnerable world hypothesis" posits that there exists a level of technological advancement at which civilization is almost certain to be destroyed unless unprecedented preventive measures and global governance are implemented.[2]

The vulnerable world hypothesis is concerned that from the stage where technological progress exceeds a certain level of technological advancement (i.e., the red line), x-risks will significantly increase, and the likelihood of human civilization being destroyed will rise dramatically.

Furthermore, humanity is likely to reach the dangerous level of technological advancement known as the red line in the relatively near future. This paper agrees with this hypothesis and examines ways to avoid the vulnerable world hypothesis associated with future technological progress.

The following risks are cited under the vulnerable world hypothesis:

If highly autonomous AI systems become widespread globally, the barriers to carrying out acts of mass destruction will be drastically lowered, potentially democratizing the capability for mass destruction (i.e., anyone in the world could easily destroy civilization).
If such AI systems are not adequately managed or controlled, for example, if they are developed or operated as open-source, the risks of misuse or accidents involving advanced AI will significantly and severely increase.
If biotechnology becomes widely practical, there is a high possibility that malicious individuals or organizations could initiate an artificial pandemic. If advanced genetic editing techniques become widely accessible, even individuals or small organizations could design and produce highly lethal viruses. Additionally, with the aid of highly autonomous AI systems, individuals without technical expertise could easily misuse biotechnology. The uncontrolled development of biotechnology could lead to the collapse of civilization due to an artificial pandemic.
If nanotechnology advances significantly without the establishment of global regulation and management systems in time, numerous risks associated with nanotechnology could spread worldwide. For example, if nanomachines become widespread, the number of x-risks associated with nanotechnology would increase to match the world's population. A catastrophic scenario could occur on a global scale, such as a young person in one country misusing or abusing nanomachines, leading to the world being destroyed by grey goo. If the pace of technological development surpasses the pace of safety development, the scenario predicted by the vulnerable world hypothesis is highly likely. Currently, technology is accelerating much faster than safety measures.
If hardware technology advances significantly, making powerful computing capabilities cheaper and more compact, the economic and technical barriers to developing advanced AI will be drastically lowered. Within the next ten years, it might become possible for anyone in the world to develop advanced AI freely. Consequently, the number of x-risks associated with AI would increase to match the world's population. If countless individuals use advanced hardware to develop AGI and those AGI systems become uncontrollable or are misused, it could lead to catastrophic situations. If it becomes easy for individuals to develop advanced AI, the traditional mechanisms for managing safety could collapse, leading to a state of technological anarchy.

The uncontrolled development of advanced technology could lead to the democratization of destructive capabilities. If technological innovation continues at its current pace, there is a possibility that, one day, everyone in the world could suddenly possess the ability to destroy civilization. Just as ChatGPT suddenly became widely available in 2023, access to destructive capabilities could suddenly become widespread globally.

This issue, referred to as "easy nukes," highlights the potential for various technologies with destructive capabilities equivalent to nuclear weapons to become widespread through technological innovation, leading to a significant increase in x-risks.

This concern encompasses various innovative technologies. The technological level to which the vulnerable world hypothesis applies is not limited to specific technologies but applies to all technologies beyond a certain level of advancement. This "certain level of technological advancement" likely refers to numerous innovative technologies realized after the development of advanced AGI (Artificial General Intelligence), such as biotechnology, nanotechnology, robotics, drones, 3D printing, genetic engineering, neuroengineering, and so on.

Additionally, a more representative and imminent risk is the risk of "misaligned AI." This will be discussed in more detail under "AI alignment," but it is a globally concerning risk.

*Misaligned AI refers to AI that does not align with human intentions or values and is unfriendly to humanity. Conversely, AI aligned with human values is referred to as "aligned AI," defined as AI that shares the same values and goals as humans or seeks to do what humans desire.

The vulnerable world hypothesis is a significant issue that should be seriously considered despite its general lack of recognition. It is a crucial hypothesis that should be examined globally, especially in the modern era where rapid technological innovation driven by AI technology is anticipated.

The future technological innovations predicted as the Singularity (technological singularity) may also have negative aspects, such as an increase in x-risks. This is particularly significant in the current era of advancing AI development and scientific applications. Ignoring this hypothesis could be tantamount to ignoring the potential collapse of future civilization.

Before technology advances further and increases x-risks, it is necessary to devise and implement measures to eliminate x-risks. Before moving on to specific considerations, it is essential to introduce the "precautionary principle," a crucial guiding principle in this context.

Application of the precautionary principle

By definition, if an x-risk occurs even once, a large part of human civilization will be irreversibly and completely destroyed. Therefore, we must eliminate all x-risks "in advance and perfectly." Consequently, the precautionary principle is applied to address x-risks.

・Precautionary principle

"Our approach to existential risks cannot be one of trial-and-error. There is no opportunity to learn from errors. The reactive approach—see what happens, limit damages, and learn from experience—is unworkable. Rather, we must take a proactive approach. This requires foresight to anticipate new types of threats and a willingness to take decisive preventive action and to bear the costs (moral and economic) of such actions."[3]

In other words, we need to address all conceivable x-risks in advance, regardless of the economic, political, or moral costs. If an x-risk occurs even once, it will be too late to do anything. Therefore, risks must be predicted and eliminated before they occur. This necessitates formulating and implementing bold and fundamental solutions, which might be considered somewhat extreme, to anticipated future risks in advance.

The precautionary principle indicates that the traditional approach to risk—solving problems through repeated trial and error—does not apply to x-risks. This is because dealing with x-risks does not allow for even a single failure, and all x-risks must be eliminated "in advance and perfectly" in one attempt. Thus, it is necessary to make careful predictions and inferences early on and to formulate and implement bold solutions for unseen threats preventively. We must eliminate all x-risks "in advance and perfectly," regardless of the cost.

Biologically impossible task

From the above, we must premise the vulnerable world hypothesis and follow the precautionary principle to eliminate future x-risks in advance to protect humanity's future.

Unfortunately, it is likely biologically impossible for humans or human-derived organizations to accomplish this task (eliminating all x-risks in advance and perfectly). This is because the skills and abilities required to eliminate x-risks are likely to exceed human biological capabilities.

1.Accurate and extensive future prediction to identify all possible x-risks in advance

To accomplish this task, it is essential to identify all possible x-risks in advance. We need to observe all scenarios the world may experience in the future and make accurate predictions before technological innovations occur to ensure we are never outpaced by rapid technological innovation.

Humans cannot make the necessary future predictions due to various cognitive biases and simple limits in information processing capabilities.
However, an as-yet hypothetical entity defined as "superintelligent AI" could potentially achieve this task using ultra-precise computer simulations.

2. Pre-development of defensive technologies against future x-risks

To eliminate x-risks associated with technological innovation, it is necessary to research, develop, and disseminate technologies (defensive technologies) that mitigate the negative aspects of advanced technology in advance.

We need to develop defensive technologies that address all conceivable technologies based on future predictions, but humans cannot perform the prerequisite future predictions.
Developing defensive technologies requires enormous resources and time, and human work efficiency cannot keep up with technological innovation.

3. Perfect surveillance system worldwide

Since x-risks can occur worldwide, it is necessary to continuously and adequately monitor the entire world to ensure global safety.

It is impossible for human resources or capabilities to monitor global developments in real-time and instantly detect potential risks.
It may also be necessary to guide the actions of specific actors based on future predictions towards safer behaviors, but human organizations cannot achieve this for various reasons.

4. Overwhelming capabilities to eliminate x-risks

To eliminate x-risks worldwide, overwhelming capabilities in technological, economic, political, social, cultural, and other areas are needed. These capabilities are collectively defined as "intelligence," but human intelligence is insufficient for this task.

Humans are limited by biological constraints determined by DNA, and no one in humanity possesses the necessary capabilities.

To sufficiently eliminate x-risks, it is necessary to "predict all possible scenarios in advance, develop defensive technologies to address them in advance, continuously monitor the entire world, and continuously eliminate all risk factors." Additionally, superhuman persuasive abilities and superhuman information processing capabilities to make all these possible will also be required.

The difficulty of such a task greatly exceeds the capabilities of human or human-derived organizations, making it biologically impossible for human organizations to achieve this task sufficiently. Therefore, it is impossible for humanity to eliminate x-risks, and it is impossible to avoid the vulnerable world hypothesis associated with technological innovation on our own.

Eliminating x-risks with guardian AI

To eliminate the x-risks predicted by the vulnerable world hypothesis, we need to delegate this task to an entity with capabilities far beyond those of humans. This entity is the "guardian AI."

The "guardian AI" is a superintelligent AI system designed to protect humanity from all potential catastrophic/existential risks. This system would continually function to safeguard humanity from numerous x-risks associated with technology and other factors. If realized, the guardian AI would function as the minimum order applicable to the entire universe.

Through some representative organization, humanity can appoint the first superintelligent AI as the guardian AI, instructing it to preemptively and perfectly predict, address, eliminate, and manage all x-risks and to continue doing so. In this process, the guardian AI must be granted extensive authority applicable worldwide and allowed to function with a certain degree of autonomy. However, delegating authority and granting autonomy to the guardian AI could result in the permanent loss of control over the superintelligent AI. Therefore, when giving these instructions, it must be ensured that the superintelligent AI aligns with humanity and remains subservient to or shares common interests with humanity indefinitely. The conditions for achieving this will be discussed later under "AI alignment."

The only entity likely capable of sufficiently eliminating the "once-occurring risk that would destroy everything" known as x-risk is the superintelligent AI. At the very least, it is clear that a police organization operated by humans cannot satisfactorily accomplish this task. Whether we like it or not, to live safely in a world that has reached the Singularity and where technology is rapidly advancing, the only way to eliminate x-risks associated with advanced technology is by using superintelligent AI.

Considering and rejecting alternative means

However, is it really only a guardian AI with superintelligent capabilities that can accomplish this task? For example, could a superhuman police force, cognitively and physically enhanced tens of times more than currently, handle these tasks? In a world where the vulnerable world hypothesis applies, such human augmentation technologies should also have been developed. Of course, this superhuman police force could accomplish this task.

However, this would likely occur after x-risks have already occurred once or more, by which time it would be too late. The issues of concern here all involve "a single chance," and if the anticipated threats occur even once, it is game over. Therefore, following the precautionary principle, all x-risks must be eliminated in advance and perfectly. We cannot wait for the police to be enhanced; we must implement defensive measures based on anticipated future technological advancements before technology reaches a dangerous level.

Potential of guardian AI

Facing such a time limit, this paper posits that "guardian AI can make it in time." In other words, the speed of development of the various technologies of concern in the vulnerable world hypothesis is considered to be slower than the speed of AI development. The future risks we need to address are likely to become apparent some time after advanced AI, such as AGI, has been developed. The pace of AI development significantly outstrips that of any other technology, likely outpacing nanotechnology and biotechnology by several years to decades.

Moreover, advanced AI technology likely possesses "intelligence" that is incomparable to humans, and this superhuman intelligence could potentially be applied to the task of eliminating x-risks. In other words, there is a high likelihood that we can develop a system to preemptively and perfectly eliminate the x-risks associated with the technologies applicable under the vulnerable world hypothesis before they are developed. This system is the "guardian AI," a superintelligent AI system designed to protect humanity from x-risks.

If we can develop the superintelligent AI system (guardian AI) to eliminate x-risks before the x-risks associated with advanced technology become apparent, humanity is likely to avoid the vulnerable world hypothesis. The superintelligent AI is very likely to achieve advanced tasks that humans cannot, and this ability would also apply to eliminating x-risks.

Two conditions for guardian AI

However, to construct a safe and powerful guardian AI, it is necessary to achieve two extremely difficult conditions. These consist of a technical condition and a political condition, defined respectively as "AI alignment" and "AI governance."

Considering the aforementioned time limit and the imminent x-risks associated with the rapid advancement of AI, it is likely that these two conditions need to be met by 2030. This paper outlines the overview and necessity of these two fields and concludes by strongly supporting increased financial support, researcher participation, and awareness in these areas.

Note: The following discussion might seem like common knowledge to many of you. This community consists of the theoretical pioneers in these fields, and someone like me does not need to explain these concepts to you. However, for structural reasons, please allow me to continue. While most of this will be familiar, there are new concepts such as "intelligence restriction." Nevertheless, for those well-versed in this field, it might be best to skip to the latter part of AI governance and the conclusion.

First condition: AI alignment

Overview

AI alignment is a field of research aimed at ensuring that AI systems, which are more advanced than humans, align with human values and ethics and act according to human intentions.[4] As AI advances and surpasses human intelligence, there is an increasing likelihood that AI will pursue goals unintended by humans or take actions that are undesirable or dangerous for humans. AI alignment exists to preemptively prevent such catastrophic risks and ensure that AI becomes a friendly entity to humans.

Necessity

1. To avoid x-risks by AI

Misaligned AI (AI that does not follow human intentions) is globally considered the most significant x-risk. If advanced AI pursues goals contrary to human interests or goals unintended by humanity, it could lead to catastrophic situations. In a 2022 expert survey, the median probability that advanced AI would lead to extremely bad outcomes (such as human extinction) was 10%. [5]

Note: This survey was conducted before ChatGPT became globally prevalent, and the figures might be more severe as of 2024.

In May 2023, hundreds of prominent AI researchers and figures, including Jeffrey Hinton, Yoshua Bengio, and Sam Altman, jointly signed a statement declaring that "mitigating the risk of AI extinction should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."[6]

In June of the same year, the UN Secretary-General stated that such warnings must be taken seriously, and in November, at the UK's AI Safety Summit, 28 countries, including the US, China, and the EU, jointly signed the Bletchley Declaration agreeing that "AI poses significant risks."[7][8]

The concern over AI x-risks is a global concern shared by prominent researchers and decision-makers, no longer a subject of science fiction. To avoid x-risks posed by AI, solving the AI alignment problem is essential.

2. To utilize superintelligent AI as guardian AI

To utilize superintelligent AI as guardian AI, it is first necessary to ensure that the AI follows human instructions. Without solving the alignment problem, it would be impossible to implement the idea of guardian AI. On the other hand, if the alignment problem can be solved, it is highly likely that superintelligent AI can be used as guardian AI.

Current status and problems

AI alignment is an urgent issue being tackled by various companies and research institutes, but currently, the number of researchers and funding is very limited. It is said that among the 100,000 AI researchers worldwide, there are only about 400 AI alignment researchers.[9]

While AI technology is advancing exponentially and rapidly, AI alignment research is not keeping up with that speed at all. Prominent AI researchers and stakeholders point out that "there is a possibility of achieving AGI within the next 3 to 5 years," yet there are hardly any signs that the AI alignment problem will be resolved by then.[10]

The AI alignment problem, compared to its severity, is significantly less known globally. It should be treated as a far more dangerous and critical issue than global warming, but such discussions are not common. Although there are still many uncertainties in AI alignment, the international community should address it as insurance for all humanity.

Second condition: AI governance

Overview

AI governance is a collective term for the norms, policies, institutions, and processes that ensure increasingly powerful AI is managed and utilized safely and beneficially so society as a whole can benefit from it.

Especially concerning the issues raised by the vulnerable world hypothesis, global governance will be necessary. This includes political initiatives to establish new international institutions to manage and utilize advanced AI globally.[11] Global governance is an increasingly important concept, requiring worldwide discussion and action.

Necessity

1. To safely manage and utilize superintelligent AI

To safely manage superintelligent AI and ensure that all humanity can equally enjoy its benefits, global political authority, organizations, policies, and other frameworks are indispensable. If AI alignment succeeds and humanity can manage AI, it will be necessary to manage and utilize that AI globally through some international institution.

2. To grant appropriate authority and instructions to superintelligent AI and have it function as guardian AI

To have superintelligent AI effectively function as guardian AI, it is necessary to grant it appropriate authority and instructions. For this purpose, an international institution with supranational authority and legitimacy is required. While this organization does not necessarily have to be a world government, it must at least have supranational authority for humanity's long-term future.

"Supranational authority" refers to global authority that surpasses all national laws and has the most powerful influence worldwide. Future global governance organizations must partially or fully possess this supranational authority.

3. To maintain a singleton by guardian AI and implement intelligence restriction

A "singleton" refers to a world order where a single decision-making entity exists at the top, exercising effective control within its domain and protecting its supreme authority from internal or external threats.[13]

To eliminate x-risks using guardian AI, it is necessary to establish and maintain a singleton centered around it (a single world order by the only superintelligent AI). For guardian AI to reliably eliminate x-risks, it must remain overwhelmingly powerful in this world. Therefore, intelligence restriction must be implemented for all other AI systems and living beings, including humans.

"Intelligence restriction" literally means setting an upper limit on the intelligence of the target, permanently preventing it from becoming more intelligent than that limit. However, the upper limit of intelligence restriction in the post-Singularity era would be an astronomical intelligence quotient, so other intelligences besides the singleton would not fall into ignorance or suffer disadvantages.

Additionally, to instruct guardian AI to implement intelligence restriction, an international institution that can represent humanity's consensus would be necessary.

Current status and problems

Various organizations are working on AI governance, but they are clearly at a disadvantage in terms of political influence compared to giant private companies. For example, in the US in 2023, there was a funding gap of at least 5 to 10 times more in lobbying activities based on officially confirmed funds.[14]

The enormous importance of AI governance also requires broader recognition and understanding. Politicians and bureaucrats, as decision-makers, should be more actively involved with these organizations.

Scholars and experts from various academic fields such as public policy, economics, sociology, law, and theology should more actively participate in the comprehensive field of AI governance. Interdisciplinary cooperation and participation will further develop this field and stimulate global discussion and action.

Conclusion

This paper concludes that if we utilize an aligned superintelligent AI as guardian AI, there is a high possibility that all future x-risks can be preemptively and perfectly eliminated. Therefore, AI alignment and AI governance are the only and greatest issues, serving as the master keys that determine all futures. Hence, organizations and decision-makers worldwide should strongly promote and implement further financial support, researcher participation, and increased awareness in these two fields.

No matter what x-risks arise in the future, if the most advanced AI in this world is aligned with humans, simply instructing that AI to eliminate x-risks would preemptively and perfectly eliminate those risks. However, if an AI more advanced than humans is not aligned with humans, catastrophic situations will occur even before future risks become a concern.

Additionally, even if the technical condition of AI alignment is achieved, without achieving the political condition of AI governance, a system like guardian AI will not be realized, and our civilization may end up in the scenario that the vulnerable world hypothesis concerns (i.e., the collapse of civilization due to uncontrolled technological innovation).

The future of humanity is driven by two forces: technological power and political power. To overcome the challenges of the future, it is necessary to achieve the two conditions of AI alignment and AI governance.

If AI alignment and AI governance are achieved, there is a high possibility that all other risks can be resolved. Therefore, organizations and decision-makers worldwide should devote more time and resources to these two fields.

Afterword

The most desirable future imaginable as of 2024—a future where we succeed in aligning superintelligent AI and establishing global governance of superintelligent AI (creating a supranational international institution and establishing a democratic management and utilization system of superintelligent AI representing all humanity)—would likely begin with the first prompt to superintelligent AI being "preemptively and perfectly predict, address, eliminate, manage, and continue doing so for all catastrophic/existential risks."

We earnestly hope that humanity can somehow overcome this issue and safely transition to a better future in the post-Singularity era while sharing common interests as humans. The idea of "guardian AI" proposed in this paper was devised to enhance that possibility even slightly.

Note: This paper was conceived based on the knowledge and predictions as of 2024, and further improvements or, in some cases, partial or complete abandonment of the idea may be necessary as the situation changes. Free reproduction, citation, critique, and translation without the author's permission are welcome.

LESSWRONG
LW