Crossposted from the Metaculus Journal.
Over a two-day workshop, Metaculus Pro Forecasters and subject-matter experts (SMEs) from several organizations evaluated policy directions for reducing AI risk using scenario planning and forecasting methodologies. The key conclusions:
The increasing capabilities of artificial intelligence have led to a growing sense of urgency to consider the potential risks and possibilities associated with these powerful systems, which are now garnering attention from public and political spheres. Notably, the Future of Life Institute’s open letter calling for a six month pause on leading AI development collected over 25,000 signatures within five weeks of release, and senior White House officials have met with AI leaders to discuss risks from AI.
This report presents the findings of a structured process, organized by Metaculus and the Future of Life Institute, which brought together Pro Forecasters and SMEs to begin identifying, quantitatively, the most impactful policy directions for reducing existential risk from misaligned AI (hereafter: “AI risk”).
The AI Pathways workshop combined a range of plausible scenarios for an AI future with probabilistic forecasts, leveraging the judgments of both SMEs and Metaculus Pro Forecasters. The goal of the exercise was to identify the most impactful actions for steering toward a positive future and away from a negative future.
We—the Metaculus AI Forecasting team, in collaboration with the FLI policy team—began by generating four possible states of the world in 2030, focusing on the impact of AI, and asking experts to identify developments which would likely play a large role in driving the world toward each of these four scenarios. The scenarios were developed with two key areas of uncertainty in mind: 1) takeoff speed and the related question of unipolarity versus multipolarity; 2) cooperation versus competition—between labs and between countries. The “Scenarios” section below outlines these scenarios in more depth. We worked with the SMEs to identify “indicators” or “drivers” of these scenarios, ranked these, identified potential U.S. government policies and leading lab coordination actions that could pertain to the most important twenty or so indicators, and developed corresponding forecasting questions.
Workshop teams, made up of integrated groups of technical AI experts, AI policy experts, and at least one Pro Forecaster, then forecasted on these questions. In practice, most of the forecasting questions asked about either:
All probabilities quoted in this report are the median forecast from the workshop and include forecasts from Pro Forecasters and SMEs. There were around 20 workshop participants, and the forecasting questions in this report were forecasted on by 5 or 6 participants on average. Quoted forecasts should be taken as having low to moderate resilience, and not as being fully authoritative.
We—Metaculus and FLI—began this project by constructing and discussing a set of four scenarios set in 2030:
We currently plan on running subsequent AI Pathways workshops, the next being in June, as part of the overall AI Pathways project.
The AI Pathways project represents a new, as far as we’re aware, angle of attack for anticipating the future of AI risk and judging the impact of high-stakes policy decisions. We think that some meaningful progress was made in this first workshop, and we hope that future workshops will be even more directly useful for policy decisions. We would welcome your thoughts on the utility of this project. Please direct any feedback to email@example.com.
AI Pathways is an initiative developed by Metaculus in partnership with the Future of Life Institute. It is designed to support U.S. policymakers as they navigate risks from advanced AI. We thank the SMEs and Pro Forecasters who contributed to this work.
This report is a project of Metaculus. It was authored by Will Aldred, with feedback from Lawrence Phillips, Nate Morrison, Dan Schwarz, Christian Williams, and Gaia Dempsey.
Metaculus, a public benefit corporation, is an online forecasting platform and aggregation engine working to improve human reasoning and coordination on topics of global importance.
51% vs 68% that there’ll be more than 10 actors at the forefront in 2030. Number of forefront actors is certainly not the only noteworthy effect of hardware policies, but it’s what we chose to focus on in this workshop.
We count an actor as being at the TAI forefront if they carried out a training run within one order of magnitude of the training compute used in the run that produced the TAI.
We define moderate catastrophe as a disaster (or series of disasters occurring within a 30-day span) that trigger a public statement of concern from at least one head of state, cabinet member, foreign minister, national security advisor, or secretary of state from one of the P5.
The main text outlines the scenario in broad strokes. Below, one concrete story for the scenario is given. (We thank the experts and forecasters in the Pause Perdures group for constructing this story.)
GPT-5 is developed by OpenAI in mid-2025.
In 2026, a terrorist cell deliberately misuses a GPT-5 level LLM to create an auto-blackmail system. It uses spear phishing campaigns to hack email, CCTV cameras, etc. to discover a wide variety of legitimate blackmail against politicians across the US, UK, China, South Korea, and Japan. Some of the blackmail material is also fabricated via deepfake audio and video, and it’s nearly impossible to tell the difference between the real and fake material. A lot of the blackmail materials are revealed amid a lot of resigning politicians, including several notable suicides and one assassination arising from the revealed blackmail information.
This AI campaign is eventually discovered by an Interpol taskforce and gets wall-to-wall coverage in every major news outlet. This coverage galvanizes international coordination towards a pause.
Throughout this story, China is at least five years behind LLM SOTA due to increasingly strong and increasingly well coordinated export controls. China’s access to cloud compute has also been heavily restricted. At the time of the pause, China has not even trained an LLM as good as GPT-4 despite significant investment.
Generally speaking, AI capabilities have remained very capital intensive, so the set of actors has remained small. At the time of the pause, only (1) OpenAI + Microsoft, (2) Google + Deepmind, (3) Anthropic, and (4) the US government are capable of making GPT >=5. Leading AI labs continue to be private and domiciled in democracies.
The US starts leading the world towards an international treaty banning AI training runs that use over ~1E28 FLOP. China initially resists these measures, but turns around and complies on account of the carrot offered: its access to AI compute supply chains are tied to participating in the treaty. (Note: an alternative story with a similar outcome could involve some successful "boxing" of China, instead of a treaty.) Secure chip technologies are developed such that centralized tracking of chips, knowledge of how they are being used, and remote shutdown are all possible. Installation of these features is mandated through supply chain controls from the U.S. government, with support from Japan, South Korea, and the Netherlands. The US government also invests heavily in cybersecurity for these actors to prevent exfiltration, and engages in a lot of compute governance.
Thanks for posting this. I am a bit surprised that the forecasts for hardware-related restrictions are so low. Are there any notes or details available on what led the group to those numbers?
In particular the spread between firmware-based monitoring (7%) and compute capacity restrictions (15%) seems too small to me. I would have expected either a higher chance of restrictions or lower chance of on-chip monitoring because both are predicated on similar decision-making steps but implementing and operating an end-to-end firmware monitoring system has many technical hurdles.
Thanks for this question.
Firstly, I agree with you that firmware-based monitoring and compute capacity restrictions would require similar amounts of political will to happen. Then, in terms of technical challenges, I remember one of the forecasters saying they believe that "usage-tracking firmware updates being rolled out to 95% of all chips covered by the 2022 US export controls before 2028" is 90% likely to be physically possible, and 70% likely to be logistically possible. (I was surprised at how high these stated percentages were, but I didn't have time then to probe them on why exactly they were at these percentages—I may do so at the next workshop.)
Assuming the technical challenges of compute capacity restrictions aren't significant, fixing compute capacity restrictions at 15% likely, and applying the following crude calculation:
P(firmware) = P(compute) x P(firmware technical challenges are met)
= 0.15 x (0.9 x 0.7) = 0.15 x 0.63 = 0.0945 ~ 9%
9% is a little above the reported 7%, which I take as meaning that the other forecasters on this question believe the firmware technical challenges are a little, but not massively, harder than the 90%–70% breakdown given above.
For what it's worth, my median scenario looks like:
Leading AI labs continue doing AI-assisted but primarily human-driven improvement to AI over the next 1-3 years. At some point during this time, a sufficiently competent general reasoning & coding model is created that shifts the balance of AI to human inputs. So the next generation of AI starts shifting in favor of increasing share of contributions from the previous AI. With this new help, the lab releases a new model in somewhat less than a year. This new model contributes even more. In just a few months, a yet newer model is designed and trained. A few months after that, another model. Then the lab says, 'hold up, this thing is now getting scary powerful. How can we be sure we trust it?' (maybe a few more or fewer model generations will be required).
Then we have a weird stalemate situation where the lab has this new powerful model which is clear superior to what it had a year ago, but is unsure about how trustworthy it is. It is safely contained and extensive tests are run, but the tests are far from definitive.
Meanwhile, the incautious open-source community continues to advance..... The countdown is ticking until the open source community catches up.
So there we are in 2026, being like, "What do we do now? We have definitely created a model powerful enough to be dangerous, but still don't have a sure way to align it. We know what advances and compute it took for us to get this far, and we know that the open-source efforts will catch up in around 3-4 years. Can we solve the alignment problem before that happens?"
I don't have a clear answer to what happens then. My guess is that a true complete solution to the alignment problem won't be found in time, and that we'll have to 'make do' with some sort of incomplete solution which is hopefully adequate to prevent disaster.
I also think there's an outside possibility of a research group not associated with one of the main labs making and publishing an algorithmic breakthrough which brings recursive self-improvement into reach of the open-source community suddenly and without warning. If that happens, what does humanity do about that? If we have kicked off this process in an open-source anyone-can-do-it scenario, and we suspect we have only a few months before further advances occur to push the open-source models to dangerous levels of competence. I don't know. If anything, I'm hoping that the big labs with their reasonable safety precautions (hopefully which get improved over the next year or two) do actually manage to come in first. Just because then it'll be at least temporarily contained, and the world's governments and large corporate actors will have the opportunity to officially verify for themselves that 'yes, this is a real thing that exists and is dangerous now'. That seems like a scenario more likely to go well than the sudden open-source breakthrough.
Thanks for this comment. I don't have much to add, other than: have you considered fleshing out and writing up this scenario in a style similar to "What 2026 looks like"?