Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures

[-]MichaelDickens5d241

This seems like a good thing to advocate for, I'm disappointed that they don't make any mention of extinction risk but I think establishing red lines would be a step in the right direction.

[-]Charbel-Raphaël3d7517

We hesitated a lot between including the term “extinction” or not in the beginning.

The final decision not to center the message on "extinction risk" was deliberate: it would have prevented most of the heads of state and organizations from signing. Our goal was to build the broadest and most influential coalition possible to advocate for international red lines, which is what's most important to us.

By focusing on the concept of "losing meaningful human control," we were able to achieve agreement on the precursor to most worst-case scenarios, including extinction. We were advised and received feedback from early experiments with signatories that this is a more concrete concept for policymakers and the public.

In summary, if you really want red lines to happen for real, adding the word extinction is not necessary and has more costs than benefits in this text.

[-]Katalina Hernandez3d*123

This is a very valuable clarification, and I agree^[1]. I really appreciate your focus on policy feasibility and concrete approaches.

From my own experience: most people outside AI Safety in the regulatory space, either: lack sufficient background knowledge about timelines and existential risk to meaningfully engage with these concerns and commit to enforceable measures^[2] , or those with some familiarity become more skeptical due to the lack of consensus on probabilities, timelines, and definitions.

I will be following this initiative closely and promoting it to the best of my ability.

EDIT: I've signed with my institutional email and title.

^{^}
For transparency: I knew about the red lines project before it was published. Furthermore, Charbel / CeSIA's past work have shifted my own views on policy and international cooperation.
^{^}
I expect that the popularity of IABIED and more involvent from AI Safety figures in policy will shift the Overton window in this regard.

[-]J Bostock1d20

When you say "prevented" do you just mean it would have been generally off-putting, or is there something specific that you're referring to here?

[-]neo5d297

I'm disappointed that they don't make any mention of extinction risk

Agree, but I wonder if extinction risk is just too vague, at the moment, for something like this. Absent a fast takeoff scenario, AI doom probably does look something like the gradual and unchecked increase of autonomy mentioned in the call, and I'm not sure if there's enough evidence of a looming fast takeoff scenario for it to be taken seriously.

I think the stakes are high enough that experts should firmly state, like Eliezer, that we should back off way before fast takeoff even seems like a possibility. But I see why that may be less persuasive to outsiders.

[-]Vladimir_Nesov5d227

Fast takeoff is not takeover is not extinction. For example gradual disempowerment without a fast takeoff leads to takeover, which may lead to either extinction or permanent disempowerment depending on values of AIs.

I think it's quite plausible that AGIs merely de facto take over frontier AI R&D, with enough economic prosperity and human figureheads to ensure humanity's complacency. And when later there's superintelligence, humanity might notice that it's left with a tiny insignificant share in resources of the reachable universe and no prospect at all of ever changing this, even on cosmic timescales.

[-]neo4d74

Agree; I'm strongly in favor of using a term like "disempowerment-risk" over "extinction-risk" in communication to laypeople – I think the latter detracts from the more important question of preventing a loss of control and emphasizes the thing that happens after, which is far more speculative (and often invites the common "sci-fi scenario" criticism).

Of course, it doesn't sound as flashy, but I think saying "we shouldn't build a machine that takes control of our entire future" is sufficiently attention-grabbing.

[-]Alexander Gietelink Oldenziel3d50

I suppose the problem is that in most fast takeoff scenarios there is little direct evidence before it happens and one should reason on priors.

[-]dr_s3d30

Well, an unstoppable superintelligence paperclipping the entire planet is certainly a national security concern and a systematic human rights violation, I guess.

Jokes aside, some of the proposed red lines clearly do hint at that - no self replication and immediate termination are clearly safeguards against the AIs themselves, not just human misuse.

[-]Camille Berger3d2112

I'm not remarkably well-versed in AI Governance and politics, but I tend to see this as a good sign. Some general thoughts, from the standpoint of a non-expert:

1-I think saying something akin to "we'll discuss and decide those risks together" is a good signal. It frames the AI Safety community as collaborative. And that's (in my view) true, there's a positive sum relationship between the different factions. IASEAI is another example of the sort of bridging I think is necessary to get legitimacy. I think it's a better signal than [Shut up and listen] "ban AGI" [this is not to be discussed] (where [] represents what opponents may infer from the 'raw' Ban AGI message). People do change their mind, if you let them speak theirs first.

2-I think this is the 'right level' of action. I don't believe in heroes stepping in last minute to save the day in frontier companies. I'm somewhat worried about the "large civil movement for AI Safety" agenda, in that it may turn out less controlable than expected. This sort of intervention is "broad, but focused on high-profile people", which seems to limit slip ups, while having a greater degree of commitment than the CAIS statement.

3-"Red Lines" are a great concept (as long as someone finds liability down the road) and offer 'customization'. This sounds empowering for whoever will sit at the table -they get to discuss where to set the red line. This shows that their autonomy is valued.

Caveat: I informally knew about the red lines project before they were published, so this may bias my impression.

[-]Charbel-Raphaël2d*90

Updates:

The global call for AI red lines got 300 media mentions, and was picked up by the world's leading newswires, AP & AFP, and featured in premier outlets, including Le Monde, NBC, CNBC, El País, The Hindu, The NYT, The Verge, and the BBC.
Yoshua Bengio, presented our Call for Red Lines at the UN Security Council: "Earlier this week, with 200 experts, including former heads of state and Nobel laureates [...], we came together to support the development of international red lines to prevent unacceptable AI risks."

[-]testingthewaters3d83

Glad to see that you're doing this kind of work, I know that getting a statement like this agreed upon must have taken a lot of coordination work. Thanks for your efforts and congratulations on getting it across the finish line :)

[-]Charbel-Raphaël2d*40

Thanks!

As an anecdote, some members of my team originally thought this project could be finished in 10 days after the French summit. I was more realistic, but even I was off by an order of magnitude. We learned our lesson.

[-]Kaj_Sotala3d50

Great work! Very much in favor.

and the Biological Weapons Convention (1975) were negotiated and ratified at the height of the Cold War, proving that cooperation is possible despite mutual distrust and hostility.

If the view given in this blog post is correct, I wouldn't use the BWC as a positive example:

One historical example about the importance of verification comes from the Biological Weapons Convention in 1972. It contained no verification measures at all: the USA and USSR just pledged not to develop biological weapons (and the Soviets denied having a program at all, a flat-out lie). The United States had already unilaterally destroyed its offensive weapons prior to signing the treaty, though the Soviets long expressed doubt that all possible facilities had been removed. The US lack of interest in verification was partially because it suspected that the Soviets would object to any measures to monitor their work within their territory, but also because US intelligence agencies didn’t really fear a Soviet biological attack.
Privately, President Nixon referred to the BWC as a “jackass treaty… that doesn’t mean anything.” And as he put it to an aide: “If somebody uses germs on us, we’ll nuke ‘em.”¹
But immediately after signing the treaty, the Soviet Union launched a massive expansion of their secret biological weapon work. Over the years, they applied the newest genetic-engineering techniques to the effort of making whole new varieties of pathogens. Years later, after all of this had come to light and the Cold War had ended, researchers asked the former Soviet biologists why the USSR had violated the treaty. Some had indicated that they had gotten indications from intelligence officers that the US was probably doing the same thing, since if they weren’t, what was the point of a treaty without verification?

(Wikipedia doesn't directly confirm all of the post's claims, but does discuss the BWC's shortcomings with regard to verification and e.g. the Soviet Union's non-compliance)

[-]Charbel-Raphaël2d20

Thanks!

Yeah, we were aware of this historical difficulty, and this is why we mention "enforcement" and "verification" in the text.

This is discussed in the Faq quickly, but I think that an IAEA for AI, which would be able to inspect the different companies, would help tremendously already. And there are many other verification mechanisms possible e.g. here:

I will see if we can add a caveat on this in the Faq.

[-]dmac_933d20

Banning autonomous self-replication and the termination priniple seem overly broad and potentially cover systems that peacefuly exist today. For example, evolutionary algorithms have self replicating entities, and control systems can operate independently and be designed to never turn off.

[-]dr_s3d41

That's not really accurate; any system operating today can usually be turned off as easy as executing a few commands in a terminal, or at worst, cutting power to some servers. Self-replication is similarly limited and contained.

If someone today even made something as basic as a simple LLM + engine that copies itself to other machines and keeps spreading, I'd say that is in fact bad, albeit certainly not world-ending bad.

[-]testingthewaters3d40

This has already been demoed: https://arxiv.org/abs/2412.12140 - Frontier AI systems have surpassed the self-replicating red line

[-]Charbel-Raphaël2d51

This paper shows it can be done in principle, but in practice curren systems are still not capable enough to do this at full scale on the internet, and I think that even if we don't die directly from full autonomous self replication, self improvement is only a few inches away, and is a true catastrophic/existential risk.

[-]dr_s3d30

Yeah I've got no doubt it can be done, though as I said I don't think it's terribly dangerous yet. But my point is that you can build perfectly well lots of current systems without running afoul of this particular red line; self-replicating entities within the larger context of an evolutionary algorithm is not the same as letting loose a smart virus that copies itself through the internet.

[-]artifex02d10

The most important red line would have to be strong superintelligence, don't you think? I mean, if we have systems that are agentic in the way humans are, but surpass us in capabilities in the way we surpass animals, it seems like specific bans on the use of weapons, self-replication, and so on might not be very effective at keeping them in check.

Was it necessary to avoid mentioning ASI in the "concrete examples" section of the website to get these signatories on board? Are you concerned that avoiding that subject might contribute to the sense that discussion of ASI is non-serious or outside of the Overton window?

[-]Charbel-Raphaël2d95

Right, but you also want to implement a red line on a system that would be precursors to this type of system, and this is why we have a red line on self-improvement.

[-]artifex02d10

So, in practice, what might that look like?

Of course, AI labs use quite a bit of AI in their capabilities research already- writing code, helping with hardware design, doing evaluations and RLAIF; even distillation and training itself could sort of be thought of as a kind of self-improvement. So, would the red line need to target just fully autonomous self-improvement? But just having a human in the loop to rubber-stamp AI decisions might not actually slow down an intelligence explosion by all that much, especially at very aggressive labs. So, would we need some kind of measure for how autonomous the capabilities research at a lab is, and then draw the line at "only somewhat autonomous"? And if we were able to define a robust threshold, could we really be confident that it would prevent ASI development altogether, rather than just slowing it down?

Suppose instead we had a benchmark that measured something like the capabilities of AI agents in long-term real-world tasks like running small businesses and managing software development projects. Do you think it might make sense to draw a red line on somewhere on that graph- targeting a dangerous level of capabilities directly, rather than trying to prevent that level of capabilities from being developed by targeting research methods?

[-]Cipolla2d-2-3

Being a Nobel prize, head of state, or holding a prestigious stance in society does not mean you understand what you are talking about. And I do not trust much UN (basically a mafia organization that does lots of signaling work and does not even represent the human population).

Also, these international treaties were not proved to work or to be robust to emergency cases.

If it comes, it need to come from the streets and the homes. If random people tomorrow drop AI, I guarantee you things will change. You need to focus on the basics, with a bottom-up approach.

[-]Charbel-Raphaël2d20

If random people tomorrow drop AI, I guarantee you things will change

Doubts.

Why would random people drop AI? Our campaign already generated 250 mentions and articles in mass media, you need this kind of outreach to reach them.
Many of those people are already against AI according to different surveys and nothing seems to happen currently.

[-]Cipolla2d10

First of all, I appreciate your energy and time, and I am not saying all of this is not useful.

Second, I have a feeling that on this forum there is too much rationality going on, and less talk about random chance, human behavior, and so on.

Said this.

Point one

I am not sure how these surveys were done, but if they were done in a naive way they might be wrong (e.g. election polls are biased if you do not frame well questions).

And even if they were true and many people say they are against AI, they probably use it indirectly, as they are consumers of other services.

My point is that people, pro- and against- AI, are using products that are exploiting the power of AI in some way, and hence they are fueling incentives in expanding AI capabilities/applications.

Even DuckDuckgo is using AI. There are too many incentives everywhere to use it.

It is on the end user to deny the application of AI to their everyday life.

Point two

We need to look at this from multiple perspectives. And I tell you why I think this is different wrt biological or nuclear weapons.

AI is becoming/sourcing addiction around people. People are addicted to ChatGPT (e.g. it represents some safe space to them), or to social media scrolling. Go around on trains and buses. Everybody is on their phone. People lost ability to get bored! Moms are not looking anymore at their babies. Friends are constantly interacting on their phone, and AI will become their friend. Crazy.

So, once people become addicted to it, it will be more difficult, on average, to get away from it. And they will fuel more and more the process of self-improving AI systems. They will give big companies opportunities to do work. This attracts money. And money attracts talent (I mean, I would also like to work on something fun that makes me money, and forget about consequences on the very long term).

I would say this sort of addiction is similar to the alcohol one. It is border line. Did the ban of alcohol in US work? Will the ban on AI work? This will depend on the people. If the base is borderline addicted to it, is difficult to do anything.

This is one of the cases that you can not really change someone if they do not change themselves within.

All of this process will come back to Point 1: fueling economical incentives in a greedy system. We need to come back to first principles, and teach people basic principles.

Unfortunately, our current society is one of low self-consciousness, easy to manipulate and steer. We really think we are smarter and better than our ancestors, but truth is we are not.

LESSWRONG
Petrov Day
LW

LESSWRONG
Petrov Day
LW

302

Global Call for AI Red Lines - Signed by Nobel Laureates, Former Heads of State, and 200+ Prominent Figures

302

302

Theory of Change

Why are international AI red lines important?

What are some examples of red lines?

Are international AI red lines even possible?

Who would enforce these red lines?

What should the next steps be?