Possibly a nitpick, but:
The development and deployment of AGI, or similarly advanced systems, could constitute a transformation rivaling those of the agricultural and industrial revolutions.
seems like a very strong understatement. Maybe replace "rivaling" with e.g. "(vastly) exceeding"?
Cross-posted on our website: https://www.convergenceanalysis.org/publications/ai-clarity-an-initial-research-agenda
Cross-posted on the EA Forum: https://forum.effectivealtruism.org/posts/JyhoTRXxYvLfFycXi/ai-clarity-an-initial-research-agenda
Executive Summary
Transformative AI (TAI) has the potential to solve many of humanity's most pressing problems, but it may also pose an existential threat to our future. This significant potential of TAI warrants careful study of the AI’s possible trajectories and their corresponding consequences. Scenarios in which TAI emerges within the next decade are likely among the most treacherous, since society will not have much time to prepare for and adapt to advanced AI.
In response to this need, Convergence Analysis has developed a research program we call AI Clarity. AI Clarity’s research method centers on scenario planning. Scenario planning is an analytical tool used by policymakers, strategists, and academics to explore and prepare for the landscape of possible outcomes in domains defined by uncertainty. Though there is no single best method, scenario planning generally combines two activities: exploring possible scenarios, and evaluating possible strategies across those scenarios. Accordingly, AI Clarity intends to explore and evaluate strategies across possible AI scenarios.
In the first area of research, “exploring scenarios,” AI Clarity will (1) identify pathways to plausible existential hazards, (2) collect and review publicly-proposed AI scenarios, (3) select key parameters across which AI scenarios vary, and (4) generate additional scenarios that arise from combinations of those parameters. In the second area of research, “evaluating strategies,” AI Clarity will (1) collect and review strategies for AI safety and governance, (2) evaluate strategies for their performance across AI scenarios, (3) develop and recommend strategies that best mitigate existential risk across all plausible scenarios.
Over the coming year, we are publishing a series of blog posts delving into the two principal research areas outlined above. These blog posts will collectively build a body of research aimed at clarifying important uncertainties in AI futures.
Motivation
The 21st century has witnessed a precipitous rise in the power and potential of artificial intelligence. Advances in semiconductor technology and widespread use of the internet have raised computational limits and expanded the availability of data, respectively, and algorithmic breakthroughs have enabled the capability of AI systems to scale with both. As a consequence, global investment in AI has soared, and AI systems have become widely adopted and integrated in society. Several leading AI labs, such as OpenAI[1] and Google DeepMind,[2] are explicitly pursuing Artificial General Intelligence (AGI) — systems that perform as well as or surpass human capabilities across all cognitive domains.
The development and deployment of AGI, or similarly advanced systems, could constitute a transformation rivaling those of the agricultural and industrial revolutions. Transformative AI (TAI) has the potential to help with solving many of humanity's most pressing problems, but it may also pose an existential threat to our future.
One set of possible trajectories involves “short timelines” to TAI, in which TAI emerges within, say, 10 years. This is not merely academic speculation: some leading AI experts[3]and prediction markets[4] expect TAI sooner rather than later. Scenarios in which TAI emerges within the next decade are likely among the most treacherous, since society will not have much time to prepare for and adapt to advanced AI. It is for these reasons that the possibility of short timelines will be a key focus of AI Clarity’s research.
TAI governance is defined not only by its urgency but also by its lack of clarity. Key actors, researchers, and organizations disagree not only about 1) the magnitude of existential risk,[5] but also 2) the share of that risk occupied by different threat models and 3) which strategies might best mitigate it.
A natural response to uncertainty is forecasting. However, the unprecedented nature of TAI makes it resistant to forecasting methods. For example, in July 2023, the Forecasting Research Institute released the results of a long-run tournament forecasting existential risks. The results emphasized the importance of TAI governance: both ‘superforecasters’ and domain experts estimated that the probability of human extinction was greater than from any other cause. However, despite months of debate, the groups’ estimates failed to converge. The report observes that “[t]he most pressing practical question for future work is: why were superforecasters so unmoved by experts’ much higher estimates of AI extinction risk, and why were experts so unmoved by the superforecasters’ lower estimates?”
It’s possible that forecasting TAI governance will become more tractable given time or different methods. But we shouldn’t rely on it. Instead, AI Clarity will explore scenario planning as a complementary approach to forecasting.
Scenario planning is an analytical tool used by policymakers, strategists, and academics to guide decision-making in domains dominated by irresolvable uncertainty.[6] Though scenario planning is underrepresented in AI safety, there is a growing awareness of its complementary effects to forecasting[7] and relevance to TAI governance.[8]
As AI systems grow increasingly capable and widespread, their consequences to social systems, economic systems, and global security may become both less predictable and increasingly dramatic. In response to this uncertainty, AI scenario planning can provide a structured framework to explore a wide range of potential AI futures. It may also be able to help us identify critical points of intervention common to avoiding the worst futures.
This methodology enables us to explore and prepare for possibilities that are often overlooked in more traditional, narrowly focused safety research. It encourages a broader and more holistic view of AI's potential impacts, encompassing a wider range of possibilities beyond the most immediate or obvious risks.
In another way of putting it, AI risk has many uncertainties, and the first step to solving any problem is to understand what the problem is. Focusing directly on understanding AI scenarios is an attempt to directly understand what the problem of AI risk is.
Our Approach
Scenario planning
The main defining feature of AI Clarity’s research approach is our application of scenario planning to AI safety and AI governance.
Though there is no one standard methodology, scenario planning can be seen as combining two major activities:
In the context of AI risk, we might call the first activity AI scenario research, and the second activity AI strategy research. Together, they provide a structure to envision and plan for a variety of potential futures. This makes scenario planning a particularly valuable tool for decision-making in areas — like AI risk — which are marked by significant uncertainties regarding the future.
Overview of our approach
AI Clarity intends to conduct high quality AI scenario and strategy research.
Exploring AI scenarios
We take the term AI scenario to refer to a possible pathway of AI development and deployment, encompassing both technical and societal aspects of this evolution. An AI scenario may be specified very precisely or very broadly. A highly specific scenario could, for example, describe a particular pathway to transformative AI, detailing what happens each year and at every key junction of development. A more general scenario (or set of scenarios) could be “transformative AI is reached through comprehensive AI services”. [9]
This exploration of AI scenarios will include:
Evaluating AI strategies
In addition to charting out the landscape of AI scenarios through the exploratory work detailed above, AI Clarity also seeks to describe how to positively influence outcomes of AI development. This is pursued through the following research activities:
Major Research Activities for 2024 and Beyond
AI Clarity’s research output will feature a series of blog posts highlighting the results of our work. These outputs will coalesce under two major themes, exploring TAI scenarios and evaluating TAI strategies, corresponding to the two major activities of scenario planning detailed earlier. Our near-term research efforts will afford particular importance to threat models with short timelines to TAI. Publishing this series of blog posts will enable continuous interaction with the wider AI research and policy communities, creating regular opportunities for feedback.
Our progress (as of April 2024)
As of early 2024, we’ve begun publishing a series of blog posts to share our initial explorations into clarifying AI scenarios:
Our team is currently and concurrently working on two more blog posts in these areas. The first is an exploration of short timelines to TAI scenarios, which will examine a combination of various parameters that combine together to give us a world in which timelines to TAI are short. The second post explores theories of victory. This work will explore conditions and combinations of various parameters that combine together to give us desirable states of the world.
Theory of Change
The motivation for AI Clarity and our approach to research is firmly rooted in Convergence’s organizational-level Theory of Change.
Figure 1. Visualization of the structure of Convergence Analysis’ organizational Theory of Change.
In order to mitigate existential threats from AI, Convergence seeks to improve decision making for AI safety and governance. To do this, we must firstly understand the societal and technical implications of AI development and deployment. Scenario planning, which is the focus of AI Clarity, is a key part of our efforts to deepen this understanding. Convergence will also work on concrete guidance with our governance recommendations research. We must secondly act to inform key parties about critical insights from our research, thus supporting decision makers to pursue effective strategies. This includes building consensus within the AI safety community, raising public awareness about the risks from AI, and advising relevant government actors.
We’re hoping to achieve these outcomes for decision making in AI safety:
This improved decision making is aimed at steering the development of advanced AI away from existential risk, towards a safe and flourishing future for humanity.
Downside risks of this work
While AI Clarity's work is vital to this theory of change, it's important to recognize some potential downside risks:
With these potential downside risks in mind, we intend to pursue the following mitigation strategies:
Conclusion
This research agenda lays out an approach to reduce AI existential risk through rigorous scenario planning. This approach involves deeply exploring the landscape of AI scenarios and evaluating strategies across them. We will collect existing perspectives, analyze parameters, generate additional scenarios, and identify effective interventions.
Through 2024 and beyond, AI Clarity will share insights and foster collaboration through public blog posts. Based on our findings, we will offer actionable recommendations to stakeholders across domains such as technical safety research, corporate governance, national regulation, and international cooperation. This work seeks to improve strategic consensus, inform policies, and coordinate interventions to enable more positive outcomes with advanced AI.
NOTES
https://openai.com/blog/planning-for-agi-and-beyond
https://arxiv.org/abs/2311.02462
https://arxiv.org/abs/2401.02843
https://www.metaculus.com/questions/5121/date-of-artificial-general-intelligence/
https://forecastingresearch.org/xpt
https://doi.org/10.1016/j.futures.2012.10.003
https://www.foreignaffairs.com/articles/united-states/2020-10-13/better-crystal-ball
https://www.imf.org/-/media/Files/Publications/Fandd/Article/2023/December/30-33-korinek-final.ashx
https://www.fhi.ox.ac.uk/reframing/