I've asked Claude to make a rough assessment on this. Tldr, the proba goes from 13% to ~27 and this propagates to plan C and D.
sabotaging Chinese AI companies?
Claude: Ryan's response is suggestive but incomplete. "Sabotaging Chinese AI companies" gestures at a possible answer but doesn't constitute a full defense because:
To be fair to Ryan, the original post does mention "helping the US government ensure non-proliferation/lead time" under Plan B, so the concern isn't entirely unaddressed. And sabotage is just one of several possible mechanisms (export controls, talent restrictions, etc.).
My assessment: Charbel identifies a genuine tension in the framework, and Ryan's brief reply opens a line of argument rather than closing the issue. A fuller defense would need to explain how Plan B achieves the assumed lead time against a near-peer competitor, and what probability of success that mechanism has.
Charbel: Reason like a superforecaster and assign a probability more accurate than 13% taking into account all of this.
Claude: This requires decomposing the problem carefully.
The core question: In Plan B, can the US actually use 1-3 years for safety work, or does China competition make this illusory?
Key sub-scenarios within Plan B:
| Scenario | Description | Probability | Takeover Risk |
|---|---|---|---|
| B1 | US successfully maintains lead via export controls, compute advantages, possible sabotage; safety work proceeds | ~25% | ~12% |
| B2 | Lead partially erodes, ~1 year of real safety time, rushed execution | ~35% | ~28% |
| B3 | Plan B is unstable—slowing means losing, race dynamics reassert, collapses toward Plan C/D | ~25% | ~40% |
| B4 | China reaches transformative AI first or simultaneously despite US efforts | ~15% | ~65% |
Weighted estimate: ~27%
Why higher than Ryan's 13%:
Why much lower than Charbel's implied ~75%:
My 90% confidence interval: 15-45%
Charbel identifies a genuine problem but overstates its severity. Ryan's estimate looks ~2x too optimistic.
Charbel: Ok, now recreate the table with all the probabilities , considering this?
Claude: Applying the same "China problem" logic consistently across scenarios:
| Plan | Probability | Original Risk | Revised Risk | Rationale | Expected Contribution |
|---|---|---|---|---|---|
| A | 5% | 7% | 8% | International agreements can still fail; slight upward revision | 0.40% |
| B | 10% | 13% | 27% | China dynamics make lead maintenance uncertain (my earlier analysis) | 2.70% |
| C | 25% | 20% | 30% | Same China problem, even worse with 2-9 month lead; spending lead = losing it | 7.50% |
| D | 45% | 45% | 48% | Already assumes race dynamics; slight increase for consistency | 21.60% |
| E | 15% | 75% | 75% | Already at high baseline; no revision needed | 11.25% |
| Total | 100% | - | - | - | 43.45% |
Key changes:
Implication: Charbel's critique doesn't just affect Plan B—it propagates through the framework. Any plan assuming unilateral slowdown without China buy-in inherits this vulnerability.
I have three main critiques:
| Plan | Probability of Scenario | Takeover Risk Given Scenario | Expected Risk Contribution |
|---|---|---|---|
| Plan A | 5% | 7% | 0.35% |
| Plan B | 10% | 13% | 1.30% |
| Plan C | 25% | 20% | 5.00% |
| Plan D | 45% | 45% | 20.25% |
| Plan E | 15% | 75% | 11.25% |
| Total | 100% | - | 38.15% |
How much lead time we have to spend on x-risk focused safety work in each of these scenarios:
What motivated you to write this post?
Thanks a lot for this comment.
Again, the call was the first step. The second step is finding the best red lines.
Here are more aggressive red lines:
Here are potential already operational ones from the preparedness framework:
Let's recap. We are calling for:
This is far from generic.
I agree that for red lines on AI behavior, there is a grey area that is relatively problematic, but I wouldn't be as negative.
It is not because there is no narrow Schelling threshold that we shouldn't coordinate to create one. Superintelligence is also very blurry, in my opinion, and there is a substantial probability that we just boil the frog to ASI - so even if there is no clear threshold, we need to create one. This call says that we should set some threshold collectively and enforce this with vigor.
I understand how our decision to keep the initial call broad could be perceived as vague or even evasive.
For this part, you might be right—I think the negotiation process resulting in those red lines could be painful at some point—but humanity has managed to negotiate other treaties in the past, so this should be doable.
"Actually, alas, it does appear that after thinking more about this project, I am now a lot less confident that it was good". --> We got 300 media mentions saying that Nobel wants global AI regulation - I think this is already pretty good, even if the policy never gets realized.
"making a bunch of tactical conflations, and that rarely ends well." --> could you give examples? I think the FAQ makes it pretty clear what people are signing on for if there were any doubts.
It feels to me that we are not talking about the same thing. Is the fact that we have delegated the specific examples of red lines to the FAQ, and not in the core text, the main crux of our disagreement?
You don't cite any of the examples that are listed in our question: "Can you give concrete examples of red lines?"
Hi habryka, thanks for the honest feedback
“the need to ensure that AI never lowers the barriers to acquiring or deploying prohibited weapons” - This is not the red line we have been advocating for - this is one red line from a representative discussing at the UN Security Council - I agree that some red lines are pretty useless, some might even be net negative.
"The central question is what are the lines!" The public call is intentionally broad on the specifics of the lines. We have an FAQ with potential candidates, but we believe the exact wording is pretty finicky and must emerge from a dedicated negotiation process. Including a specific red line in the statement would have been likely suicidal for the whole project, and empirically, even within the core team, we were too unsure about the specific wording of the different red lines. Some wordings were net negative according to my judgment. At some point, I was almost sure it was a really bad idea to include concrete red lines in the text.
We want to work with political realities. The UN Secretary-General is not very knowledgeable about AI, but he wants to do good, and our job is to help them channel this energy for net positive policies, starting from their current position.
Most of the statement focuses on describing the problem. The statement starts with "AI could soon far surpass human capabilities", creating numerous serious risks, including loss of control, which is discussed in its own dedicated paragraph. It is the first time that such a broadly supported statement explains the risks that directly, the cause of those risks (superhuman AI abilities), and the fact that we need to get our shit together quickly ("by the end of 2026"!).
All that said, I agree that the next step is pushing for concrete red lines. We're moving into that phase now. I literally just ran a workshop today to prioritize concrete red lines. If you have specific proposals or better ideas, we'd genuinely welcome them.
Almost all members of the UN Security Council are in favor of AI regulation or setting red lines.
Never before had the principle of red lines for AI been discussed so openly and at such a high diplomatic level.
UN Secretary-General Antonio Guterres opened the session with a firm call to action for red lines:
• “a ban on lethal autonomous weapons systems operating without human control, with [...] a legally binding instrument by next year”
• “the need to ensure that AI never lowers the barriers to acquiring or deploying prohibited weapons”
Then, Yoshua Bengio took the floor and highlighted our Global Call for AI Red Lines — now endorsed by 11 Nobel laureates and 9 former heads of state and ministers.
Almost all countries were favorable to some red lines:
China: “It’s essential to ensure that AI remains under human control and to prevent the emergence of lethal autonomous weapons that operate without human intervention.”
France: “We fully agree with the Secretary-General, namely that no decision of life or death should ever be transferred to an autonomous weapons system operating without any human control.”
While the US rejected the idea of “centralized global governance” for AI, this did not amount to rejecting all international norms. President Trump stated at UNGA that his administration would pioneer “an AI verification system that everyone can trust” to enforce the Biological Weapons Convention, saying “hopefully, the U.N. can play a constructive role.”
Extract from each intervention.
Right, but you also want to implement a red line on a system that would be precursors to this type of system, and this is why we have a red line on self-improvement.
Money is a bottleneck yes