10 The ultimate goal

5th Jul 2025

AI Alignment ForumLinkpost from forecastingaifutures.substack.com

6 min read

10 Ω 3

My AI forecasting work aims to improve our understanding of the future so we can prepare for it and influence it in positive directions. Yet one problem remains: how do you turn foresight into action? I’m not sure, but I have some thoughts about learning the required skills.

Say you discover existential AI risks and consider redirecting your entire career to address these threats. Seeking career guidance, you find the 80,000 Hours website, and encounter this page, which outlines two main approaches: technical AI safety research and AI governance/policy work.

You develop a career plan: "Educate yourself in governance, seek positions in policy advocacy organizations, and advocate for robust policies like whistleblower protections and transparency requirements for frontier AI labs in the US." It's a sensible, relatively robust plan that many concerned about AI pursue.

You work hard, and your efforts bear fruit! Better whistleblower protections are implemented. It’s a small, incremental improvement in governance—but that’s it. Other plans may culminate with marginal improvements in AI evaluation techniques, AI control, lab processes, public awareness, or international coordination.

Don’t get me wrong—marginal AI safety improvements are great! But these plans aren’t true AI safety solution paths. I struggle finding—or coming up with—detailed plans that culminate in outcomes like:

An international oversight institution ensures AIs stay below capability thresholds through extensive compute monitoring, ensuring frontier AI development occurs only within a single, secure international project with mechanisms for avoiding dangerous power concentration (see A Narrow Path)
All AI hardware is replaced with AI chips that can only run systems that have been verified to be completely safe (see Provably Safe Systems)
A global moratorium on AI development maintains capabilities below dangerous thresholds until we solve alignment and societal integration of highly advanced AI (see the PauseAI Proposal)

There are plenty of suggestions on what to aim for, yet I rarely encounter plans that are more sophisticated and concrete than “raise public awareness to pressure politicians into adopting our proposal” or “develop enabling technologies (e.g. alignment methods, or hardware assurance mechanisms) that may then hopefully be widely applied, perhaps enforced through policy.”

The vagueness is frustrating, and likely reduces success probabilities.

What Is a “Solution Path” Anyway?

By "solution path," I mean something those concerned about catastrophic AI risks can examine and think: "This looks like a reasonable, complete plan that could deliver outsized impact rather than marginal risk reduction." Something that explores the required steps and sub-goals, counters potential difficulties, and adapts to changing circumstances.

It doesn’t seem completely impossible to come up with such solution paths. Yet, I don’t know strategies for developing solutions to world-spanning problems, unfolding across years over technical, social, and political domains—seemingly insurmountable, or "impossible" problems.

How do you examine the underlying mechanisms of the world, far more complex than any mechanical clockwork, and figure out how to make improvements?

Learning Strategies

In the domain of math, you may learn some good strategies for problem solving from challenging a few difficult questions. I have a very simple process that I usually follow—I summarize problem specifications and list relevant theory. That’s it. Solutions are often obvious when you have all the required information in front of you, as long as you have a good grasp of the related theory. Complex problems require effort, but they're essentially reduced to puzzles, solved by combining the information in the right way.

Forecasting is less structured than solving math problems, and you don’t get to know your score until reality gives you an answer. Nevertheless, there are traps to avoid and principles to follow that you can learn relatively quickly—just read Superforecasting by Philip Tetlock and practice making predictions. After a while you start noticing when you are overconfident—especially in areas where you think you are knowledgeable. You may learn to recognize the trap of underestimating task completion time (planning fallacy), or when you are overestimating the probability of an outcome you prefer (goal-oriented motivated reasoning).

While the exact steps vary more for forecasting than for solving math problems, there are key rules of thumb to follow. Start with reference classes and trend lines, before considering the specific circumstances for that which you are trying to predict. Consider all relevant information before making a prediction to avoid becoming prematurely attached to a probability estimate. When making mistakes, think about what you can do better in the future instead of justifying your errors (good advice in general).

But how do you solve world-spanning, multi-year, seemingly insurmountable problems? Without years of trial and error, you need shorter feedback loops that develop the required skills. Forecasters have a clever trick for this—instead of waiting for the future to arrive to give you your prediction scores, simply compare your forecasts to predictions that you trust. You can go to the forecasting platform Metaculus and compare your predictions to the Community Predictions, that are known to be fairly accurate. (Turn off “Show Community Prediction by default on open questions” in the Metaculus settings to ensure you don’t see the Community Prediction before making your own prediction.)

Is there an equivalent trick for learning strategies solving societal problems like catastrophic AI risk?

Clever Tricks for Learning Faster

First approach: Consult experienced practitioners. Just as professors intuit promising PhD proposals from supervision experience, people who've tackled insurmountable problems might guide newcomers. Such mentors are rarer than professors, though.

Second approach: Study executed plans. Analyze what worked and what failed for previous ambitious plans to shape the future. Lessons from strategic foresight—the discipline of deliberately shaping the future—may offer relevant wisdom. Case studies on specific successes and failures related specifically to AI safety may also prove valuable, examining for instance the rise and fall of the SB 1047 bill, or what led to the formation of the AI Safety Institutes.

Third approach: Start with rough plans and iterate. Create an initial vague—and probably stupid—plan and refine it over time. This is a backwards way of doing things, where you initially don’t know exactly what you are solving. It’s much easier to start out with a concrete problem formulation with a constrained solution space. But it might work for fuzzy, complicated, open-ended problems like “AI safety”. If each iteration moves the plan a little closer to something more complete and likely of succeeding, then each iteration also provides experience in strategic thinking—enabling the desired shorter feedback loop^[1].

This third approach has the additional benefit of already being a strategy for solving complicated problems, not just a way of learning how to solve such problems.

Time for Reflection

One of my core motivations for doing forecasting—and writing this blog—is to better understand what can be done to ensure AI safety.

Initially, I believed that specific accurate predictions were the most strategically valuable output, so I focused on collecting forecasts from platforms with strong track records—as seen in my first four posts.

However, I increasingly recognized that good analyses matter as much as the predictions themselves—finding out the reasons some events are likely/unlikely, or why some things are especially hard to foresee. This led me to explore forecasting methodologies (like scenario forecasting) and tackle more complex subjects (like emergent alignment).

Yet even sophisticated analyses have limitations. Forecasting reaches its full potential only when used to explore possible futures—as demonstrated in the AI 2027 scenario. While often relying on probabilistic predictions, scenario work illuminates the underlying dynamics shaping what's to come. Thus, the Rogue Replication Timeline.

My post on Powerful Predictions (exploring forecasting’s applications in AI and AI safety), and the post you are reading now, places forecasting within a broader strategic context.

Though there are exceptions to the trend, there a clear arc: examining predictions → understanding predictions → aggregating predictions into comprehensive future projections → exploring forecasting's broader strategic role^[2].

Looking at this pattern and the ultimate goal of my forecasting work, a natural next step would be to integrate scenario forecasting and quantitative predictions into comprehensive solution paths for AI safety. Starting with rough initial frameworks, then iterating while studying strategic foresight literature and learning from experienced practitioners.

Since I can predict where my focus will eventually turn, why not start immediately?

Thank you for reading! If you found value in this post, consider subscribing!

^{^}
This resembles my general approach to understanding concepts or problems I find very difficult. I write down my thoughts about the issue and revisit it from time to time, perhaps adding some new knowledge or perspectives each time. I don’t claim to understand exactly how, but after enough revisits understanding often emerges. My brain finds ways to reason about it. Maybe this works for insurmountable problems too?
^{^}
I realize this arc may be more obvious to me than others, as I have more inside knowledge to changes in my way of thinking and my priorities.

Forecasting & PredictionProblem-solving (skills and techniques)Scholarship & LearningAI

Frontpage

10 Ω 3

New Comment

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:15 PM

[-]AnthonyC6mo130

Upvoted - I do think lack of a coherent, actionable strategy that actually achieves goals if successful is a general problem of many advocacy movements, not just AI. A few observations:

(1) Actually-successful historical advocacy movements that solved major problems usually did so incrementally over many iterations, taking the wins they could get at each moment while putting themselves in position to take advantage when further opportunities arose.

(2) Relatedly, don't complain about incremental improvements (yours or others'). Celebrate them, or no one will want to work with you or compromise with you, and you won't end up in position to get more wins later.

(3) Raising awareness isn't a terminal goal or a solution, but it gives others a reason to pay attention to you at all. If you have actually good proposals for what to do about a problem, and are in a position to make the case that your proposals are effective and practical, then a perception that the problem is real and a solution is necessary can be very helpful. If a politician solves a major problem that is not yet a crisis, or is not seen as a crisis by their constituents, then solving the problem just looks like wasting money/time/effort to the people that decide if they get to keep their jobs.

(4) Don't plan a path that leads to victory, plan so that all paths lead to victory. If you make a plan, any plan, to achieve an outcome that is sufficient, it will require many things to go right, and therefore will not work, for reasons you fail to anticipate, and it will also taint your further planning efforts along predetermined directions, limiting your ability to adapt to future opportunities and setbacks. Avoiding this failure mode is part of the upshot of seeking and celebrating incremental wins unreservedly and consistently, as long as those wins don't cut off the path to further progress.

(5) Being seen to have a long-term plan that no one currently in power would support seems like a quick way to get shut out of a conversation unless you already have some form of power such that you're hard to ignore.

I was so glad the other day to see Nate Soares talk about the importance of openly discussing x-risks, and also the recent congressional hearings that actually started to ask about real AI risks, because it's an opening to push the conversation in useful directions. I genuinely worry that AI safety orgs and advocates will make the mistakes that e.g. climate change activists often make: Shut down proposals that are clearly net improvements likely to increase public support for further action, in favor of (in practice) counterproductively maintaining the status quo and turning people off. I started openly discussing x-risk with more and more people in my life last year, and found they have been quite receptive to it from people they knew and trusted to generally be reasonable.

I do think there is value in having organizations around with the kinds of plans you are discussing, but I don't think, in general, those are the ones that actually get the opportunity to make big wins. I think they serve as generators of ideas that get filtered through more incremental and 'moderate' organizations over time, and make those other organizations seem like better partners to collaborate with. I don't have good data for this, more a general intuition from looking at a few historical examples.

[-]Alvin Ånestrand6mo20

Thank you for sharing your thoughts! My responses:

(1) I believe most historical advocacy movements have required more time than we might have for AI safety. More comprehensive plans might speed things up. It might be valuable to examine what methods have worked for fast success in the past.

(2) Absolutely.

(3) Yeah, raising awareness seems like it might be a key part of most good plans.

(4) All paths leading to victory would be great, but I think even plans that would most likely fail are still valuable. They illuminate options and tie ultimate goals to concrete action. I find it very unlikely that failing plans are worse than no plans. Perhaps high standards for comprehensive plans might have contributed to the current shortage of plans. “Plans are worthless, but planning is everything.” Naturally I will aim for all-paths-lead-to-victory plans, but I won't be shy in putting ideas out there that don't live up to that standard.

(5) I don't currently have much influence, so the risk would be sacrificing inclusion in future conversations. I think it's worth the risk.

I would consider it a huge success if the ideas were filtered through other orgs, even if they just help make incremental progress. In general, I think the AI safety community might benefit from having comprehensive plans to discuss and critique and iterate on over time. It would be great if I could inspire more people to try.

[-]AnthonyC6mo40

(1) I agree, but don't have confidence that this alternate approach results in faster progress. I hope I'm proven wrong.

(4) Also agreed, but I think this hinges on whether the failing plans are attempted in such a way that they close off other plans, either by affecting planning efforts or by affecting reactions to various efforts.

(5) Fair enough.

Moderation Log

LESSWRONG
is fundraising!
LW