This seems like a good thing to advocate for, I'm disappointed that they don't make any mention of extinction risk but I think establishing red lines would be a step in the right direction.
We hesitated a lot between including the term “extinction” or not in the beginning.
The final decision not to center the message on "extinction risk" was deliberate: it would have prevented most of the heads of state and organizations from signing. Our goal was to build the broadest and most influential coalition possible to advocate for international red lines, which is what's most important to us.
By focusing on the concept of "losing meaningful human control," we were able to achieve agreement on the precursor to most worst-case scenarios, including extinction. We were advised and received feedback from early experiments with signatories that this is a more concrete concept for policymakers and the public.
In summary, if you really want red lines to happen for real, adding the word extinction is not necessary and has more costs than benefits in this text.
This is a very valuable clarification, and I agree[1]. I really appreciate your focus on policy feasibility and concrete approaches.
From my own experience: most people outside AI Safety in the regulatory space, either: lack sufficient background knowledge about timelines and existential risk to meaningfully engage with these concerns and commit to enforceable measures[2] , or those with some familiarity become more skeptical due to the lack of consensus on probabilities, timelines, and definitions.
I will be following this initiative closely and promoting it to the best of my ability.
EDIT: I've signed with my institutional email and title.
For transparency: I knew about the red lines project before it was published. Furthermore, Charbel / CeSIA's past work have shifted my own views on policy and international cooperation.
I expect that the popularity of IABIED and more involvent from AI Safety figures in policy will shift the Overton window in this regard.
When you say "prevented" do you just mean it would have been generally off-putting, or is there something specific that you're referring to here?
I'm disappointed that they don't make any mention of extinction risk
Agree, but I wonder if extinction risk is just too vague, at the moment, for something like this. Absent a fast takeoff scenario, AI doom probably does look something like the gradual and unchecked increase of autonomy mentioned in the call, and I'm not sure if there's enough evidence of a looming fast takeoff scenario for it to be taken seriously.
I think the stakes are high enough that experts should firmly state, like Eliezer, that we should back off way before fast takeoff even seems like a possibility. But I see why that may be less persuasive to outsiders.
Fast takeoff is not takeover is not extinction. For example gradual disempowerment without a fast takeoff leads to takeover, which may lead to either extinction or permanent disempowerment depending on values of AIs.
I think it's quite plausible that AGIs merely de facto take over frontier AI R&D, with enough economic prosperity and human figureheads to ensure humanity's complacency. And when later there's superintelligence, humanity might notice that it's left with a tiny insignificant share in resources of the reachable universe and no prospect at all of ever changing this, even on cosmic timescales.
Agree; I'm strongly in favor of using a term like "disempowerment-risk" over "extinction-risk" in communication to laypeople – I think the latter detracts from the more important question of preventing a loss of control and emphasizes the thing that happens after, which is far more speculative (and often invites the common "sci-fi scenario" criticism).
Of course, it doesn't sound as flashy, but I think saying "we shouldn't build a machine that takes control of our entire future" is sufficiently attention-grabbing.
I suppose the problem is that in most fast takeoff scenarios there is little direct evidence before it happens and one should reason on priors.
Well, an unstoppable superintelligence paperclipping the entire planet is certainly a national security concern and a systematic human rights violation, I guess.
Jokes aside, some of the proposed red lines clearly do hint at that - no self replication and immediate termination are clearly safeguards against the AIs themselves, not just human misuse.
I'm not remarkably well-versed in AI Governance and politics, but I tend to see this as a good sign. Some general thoughts, from the standpoint of a non-expert:
1-I think saying something akin to "we'll discuss and decide those risks together" is a good signal. It frames the AI Safety community as collaborative. And that's (in my view) true, there's a positive sum relationship between the different factions. IASEAI is another example of the sort of bridging I think is necessary to get legitimacy. I think it's a better signal than [Shut up and listen] "ban AGI" [this is not to be discussed] (where [] represents what opponents may infer from the 'raw' Ban AGI message). People do change their mind, if you let them speak theirs first.
2-I think this is the 'right level' of action. I don't believe in heroes stepping in last minute to save the day in frontier companies. I'm somewhat worried about the "large civil movement for AI Safety" agenda, in that it may turn out less controlable than expected. This sort of intervention is "broad, but focused on high-profile people", which seems to limit slip ups, while having a greater degree of commitment than the CAIS statement.
3-"Red Lines" are a great concept (as long as someone finds liability down the road) and offer 'customization'. This sounds empowering for whoever will sit at the table -they get to discuss where to set the red line. This shows that their autonomy is valued.
Caveat: I informally knew about the red lines project before they were published, so this may bias my impression.
Updates:
Glad to see that you're doing this kind of work, I know that getting a statement like this agreed upon must have taken a lot of coordination work. Thanks for your efforts and congratulations on getting it across the finish line :)
Thanks!
As an anecdote, some members of my team originally thought this project could be finished in 10 days after the French summit. I was more realistic, but even I was off by an order of magnitude. We learned our lesson.
Great work! Very much in favor.
and the Biological Weapons Convention (1975) were negotiated and ratified at the height of the Cold War, proving that cooperation is possible despite mutual distrust and hostility.
If the view given in this blog post is correct, I wouldn't use the BWC as a positive example:
One historical example about the importance of verification comes from the Biological Weapons Convention in 1972. It contained no verification measures at all: the USA and USSR just pledged not to develop biological weapons (and the Soviets denied having a program at all, a flat-out lie). The United States had already unilaterally destroyed its offensive weapons prior to signing the treaty, though the Soviets long expressed doubt that all possible facilities had been removed. The US lack of interest in verification was partially because it suspected that the Soviets would object to any measures to monitor their work within their territory, but also because US intelligence agencies didn’t really fear a Soviet biological attack.
Privately, President Nixon referred to the BWC as a “jackass treaty… that doesn’t mean anything.” And as he put it to an aide: “If somebody uses germs on us, we’ll nuke ‘em.”1
But immediately after signing the treaty, the Soviet Union launched a massive expansion of their secret biological weapon work. Over the years, they applied the newest genetic-engineering techniques to the effort of making whole new varieties of pathogens. Years later, after all of this had come to light and the Cold War had ended, researchers asked the former Soviet biologists why the USSR had violated the treaty. Some had indicated that they had gotten indications from intelligence officers that the US was probably doing the same thing, since if they weren’t, what was the point of a treaty without verification?
(Wikipedia doesn't directly confirm all of the post's claims, but does discuss the BWC's shortcomings with regard to verification and e.g. the Soviet Union's non-compliance)
Thanks!
Yeah, we were aware of this historical difficulty, and this is why we mention "enforcement" and "verification" in the text.
This is discussed in the Faq quickly, but I think that an IAEA for AI, which would be able to inspect the different companies, would help tremendously already. And there are many other verification mechanisms possible e.g. here:
I will see if we can add a caveat on this in the Faq.
Banning autonomous self-replication and the termination priniple seem overly broad and potentially cover systems that peacefuly exist today. For example, evolutionary algorithms have self replicating entities, and control systems can operate independently and be designed to never turn off.
That's not really accurate; any system operating today can usually be turned off as easy as executing a few commands in a terminal, or at worst, cutting power to some servers. Self-replication is similarly limited and contained.
If someone today even made something as basic as a simple LLM + engine that copies itself to other machines and keeps spreading, I'd say that is in fact bad, albeit certainly not world-ending bad.
This has already been demoed: https://arxiv.org/abs/2412.12140 - Frontier AI systems have surpassed the self-replicating red line
This paper shows it can be done in principle, but in practice curren systems are still not capable enough to do this at full scale on the internet, and I think that even if we don't die directly from full autonomous self replication, self improvement is only a few inches away, and is a true catastrophic/existential risk.
Yeah I've got no doubt it can be done, though as I said I don't think it's terribly dangerous yet. But my point is that you can build perfectly well lots of current systems without running afoul of this particular red line; self-replicating entities within the larger context of an evolutionary algorithm is not the same as letting loose a smart virus that copies itself through the internet.
The most important red line would have to be strong superintelligence, don't you think? I mean, if we have systems that are agentic in the way humans are, but surpass us in capabilities in the way we surpass animals, it seems like specific bans on the use of weapons, self-replication, and so on might not be very effective at keeping them in check.
Was it necessary to avoid mentioning ASI in the "concrete examples" section of the website to get these signatories on board? Are you concerned that avoiding that subject might contribute to the sense that discussion of ASI is non-serious or outside of the Overton window?
Right, but you also want to implement a red line on a system that would be precursors to this type of system, and this is why we have a red line on self-improvement.
So, in practice, what might that look like?
Of course, AI labs use quite a bit of AI in their capabilities research already- writing code, helping with hardware design, doing evaluations and RLAIF; even distillation and training itself could sort of be thought of as a kind of self-improvement. So, would the red line need to target just fully autonomous self-improvement? But just having a human in the loop to rubber-stamp AI decisions might not actually slow down an intelligence explosion by all that much, especially at very aggressive labs. So, would we need some kind of measure for how autonomous the capabilities research at a lab is, and then draw the line at "only somewhat autonomous"? And if we were able to define a robust threshold, could we really be confident that it would prevent ASI development altogether, rather than just slowing it down?
Suppose instead we had a benchmark that measured something like the capabilities of AI agents in long-term real-world tasks like running small businesses and managing software development projects. Do you think it might make sense to draw a red line on somewhere on that graph- targeting a dangerous level of capabilities directly, rather than trying to prevent that level of capabilities from being developed by targeting research methods?
Being a Nobel prize, head of state, or holding a prestigious stance in society does not mean you understand what you are talking about. And I do not trust much UN (basically a mafia organization that does lots of signaling work and does not even represent the human population).
Also, these international treaties were not proved to work or to be robust to emergency cases.
If it comes, it need to come from the streets and the homes. If random people tomorrow drop AI, I guarantee you things will change. You need to focus on the basics, with a bottom-up approach.
If random people tomorrow drop AI, I guarantee you things will change
Doubts.
First of all, I appreciate your energy and time, and I am not saying all of this is not useful.
Second, I have a feeling that on this forum there is too much rationality going on, and less talk about random chance, human behavior, and so on.
Said this.
Point one
I am not sure how these surveys were done, but if they were done in a naive way they might be wrong (e.g. election polls are biased if you do not frame well questions).
And even if they were true and many people say they are against AI, they probably use it indirectly, as they are consumers of other services.
My point is that people, pro- and against- AI, are using products that are exploiting the power of AI in some way, and hence they are fueling incentives in expanding AI capabilities/applications.
Even DuckDuckgo is using AI. There are too many incentives everywhere to use it.
It is on the end user to deny the application of AI to their everyday life.
Point two
We need to look at this from multiple perspectives. And I tell you why I think this is different wrt biological or nuclear weapons.
AI is becoming/sourcing addiction around people. People are addicted to ChatGPT (e.g. it represents some safe space to them), or to social media scrolling. Go around on trains and buses. Everybody is on their phone. People lost ability to get bored! Moms are not looking anymore at their babies. Friends are constantly interacting on their phone, and AI will become their friend. Crazy.
So, once people become addicted to it, it will be more difficult, on average, to get away from it. And they will fuel more and more the process of self-improving AI systems. They will give big companies opportunities to do work. This attracts money. And money attracts talent (I mean, I would also like to work on something fun that makes me money, and forget about consequences on the very long term).
I would say this sort of addiction is similar to the alcohol one. It is border line. Did the ban of alcohol in US work? Will the ban on AI work? This will depend on the people. If the base is borderline addicted to it, is difficult to do anything.
This is one of the cases that you can not really change someone if they do not change themselves within.
All of this process will come back to Point 1: fueling economical incentives in a greedy system. We need to come back to first principles, and teach people basic principles.
Unfortunately, our current society is one of low self-consciousness, easy to manipulate and steer. We really think we are smarter and better than our ancestors, but truth is we are not.
The Global Call for AI Red Lines was released and presented in the opening of the first day of the 80th UN General Assembly high-level week.
It was initiated by CeSIA, the French Center for AI Safety, in partnership with The Future Society and the Center for Human-compatible AI.
The full text of the call reads:
AI holds immense potential to advance human wellbeing, yet its current trajectory presents unprecedented dangers. AI could soon far surpass human capabilities and escalate risks such as engineered pandemics, widespread disinformation, large-scale manipulation of individuals including children, national and international security concerns, mass unemployment, and systematic human rights violations.
Some advanced AI systems have already exhibited deceptive and harmful behavior, and yet these systems are being given more autonomy to take actions and make decisions in the world. Left unchecked, many experts, including those at the forefront of development, warn that it will become increasingly difficult to exert meaningful human control in the coming years.
Governments must act decisively before the window for meaningful intervention closes. An international agreement on clear and verifiable red lines is necessary for preventing universally unacceptable risks. These red lines should build upon and enforce existing global frameworks and voluntary corporate commitments, ensuring that all advanced AI providers are accountable to shared thresholds.
We urge governments to reach an international agreement on red lines for AI — ensuring they are operational, with robust enforcement mechanisms — by the end of 2026.
This call has been signed by a coalition of 200+ former heads of state, ministers, diplomats, Nobel laureates, AI pioneers, industry experts, human rights advocates, political leaders, and other influential thinkers, as well as 70+ organizations.
Signatories include:
In Seoul, companies pledged to “Set out thresholds at which severe risks posed by a model or system, unless adequately mitigated, would be deemed intolerable”, but there is still nothing today that prevents Meta/xAI/everyone from setting thresholds too high, or not setting them at all. Without common rules, this race is a race to the bottom, and safety-conscious actors are going to be disadvantaged.
International AI red lines are critical because they establish clear prohibitions on the development, deployment, and use of systems that pose unacceptable risks to humanity. They are:
The red lines could focus either on AI behaviors (i.e., what the AI systems can do) or on AI uses (i.e., how humans and organizations are allowed to use such systems). The following examples show the kind of boundaries that can command broad international consensus.
Note that the campaign does not focus on endorsing any specific red lines. Their specific definition and clarification will have to be the result of scientific-diplomatic dialogues.
Examples of red lines on AI uses:
Examples of red line on AI behaviors:
Red lines on AI behaviors have already started being operationalized in the Safety and Security frameworks from AI companies such as Anthropic’s Responsible Scaling Policy, OpenAI’s Preparedness Framework, and DeepMind’s Frontier Safety Framework. For example, for AI models above a critical level of cyber-offense capability, OpenAI states that “Until we have specified safeguards and security controls standards that would meet a critical standard, halt further development.” Definitions of critical capabilities that require robust mitigations would need to be harmonized and strengthened between those different companies.
Yes, history shows that international cooperation on high-stakes risks is entirely achievable. When the cost of inaction is too catastrophic, humanity has consistently come together to establish binding rules to prevent global disasters or profound harms to humanity and global stability.
The Treaty on the Non-Proliferation of Nuclear Weapons (1970) and the Biological Weapons Convention (1975) were negotiated and ratified at the height of the Cold War, proving that cooperation is possible despite mutual distrust and hostility. The Montreal Protocol (1987) averted a global environmental catastrophe by phasing out ozone-depleting substances, and the UN Declaration on Human Cloning (2005) established a crucial global norm to safeguard human dignity from the potential harms of reproductive cloning. Most recently, the High Seas Treaty (2025) provided a comprehensive set of regulations for high seas conservation and serves as a sign of optimism for international diplomacy.
In the face of global, irreversible threats that know no borders, international cooperation is the most rational form of national self-interest.
There is no single global authority for AI, so enforcement would likely combine different levels of governance, including:
Several complementary pathways could be envisaged:
Any future treaty should be built on three pillars: a clear list of prohibitions; robust, auditable verification mechanisms; and the appointment of an independent body established by the Parties to oversee implementation.
You can access the website here: https://red-lines.ai
Happy to get feedback on the red lines agenda. CeSIA is currently considering continuing the work needed to ensure those red lines come to life, and there is still a lot of work ahead.
Thanks to Arthur Grimonpont, who drafted the first version of the FAQ. Thanks to Niki Iliadis, Su Cizem, and Tereza Zoumpalova for their significant revision. Thanks to Lucie Philippon for helping with the LW post. And thanks for the many people who worked hard to make this happen.