I don't know the methodology behind how the statement you have has been drafted, I don't think you even mention the statement directly in this post, but I will say that I think the correct methodology here is not to first come up with a statement, and then email it out and ask a bunch of researchers whether they'd sign it. That is a recipe for exactly what you're encountering here. The researchers you're reaching out to have a very different perspective on public communication and how they'd like to represent their beliefs than you do, in your role as de-facto public relations managers in this project.
That is a good thing! We would like our researchers to primarily be thinking on simulacra level 1, and that means they will sign very different statements than what may seem optimal from the media's perspective.
However, as you point out, it can also be a bad thing, and decrease the PR manager's ability to... well... manage.
That is why I believe the solution is to first email out the main researchers who you'd like to sign the statement, and ask what sorts of statements they would be willing to sign. What properties must the statement have in order for them to be comfortable putting their name below it. Then you create a statement which you know will have at least some base of support among researchers. You should expect a reasonable amount of iteration & compromise here, and not to, on your first try, create a statement for which everyone you want to sign signs.
I will say also that it seems likely this is how the CAIS statement was drafted too. They spent (if I remember right) quite a while work-shopping it. That statement (to my knowledge) did not just appear out of the aether in its current state. It took work and compromise to get it.
Again, I don't know whether you actually fell for the trap I mention, but it seems likely given the pushback you're getting
As I understand, this is how scientific bodies' position statements get written. Scientists do not universally agree about the facts in their field, but they iterate on the statement until none of the signatories have any major objections.
I wouldn't support a "permanent ban" / no such thing as a permanent ban
This calls for allying with the people actually endorsing a permanent ban (who do exist), even as you might consider a permanent ban impossible. Many people consider a temporary ban similarly impossible.
It does seem inconvenient to add some sort of "until it's actually safe to proceed" if it's a single sentence. There's also the option of framing this as a Pause rather than a ban, though in that case it needs to be made clear that the Pause is not about time.
Lucius Bushnaq's "some form of global ban or pause" seems to work for both purposes, ambiguity between a ban and a pause clarifies both that the pause could be indefinite and that the ban could be temporary.
[Context: This post is aimed at all readers[1] who broadly agree that the current race toward superintelligence is bad, that stopping would be good, and that the technical pathways to a solution are too unpromising and hard to coordinate on to justify going ahead.]
TL;DR: We address the objections made to a statement supporting a ban on superintelligence by people who agree that a ban on superintelligence would be desirable.
Quoting Lucius Bushnaq:
I support some form of global ban or pause on AGI/ASI development. I think the current AI R&D regime is completely insane, and if it continues as it is, we will probably create an unaligned superintelligence that kills everyone.
We have been circulating a statement expressing ~this view, targeted at people who have done AI alignment/technical AI x-safety research (mostly outside frontier labs). Some people declined to sign, even if they agreed with the expressed view. In this post, we want to address their objections (including those we agree with / think are reasonable).
But first, some context/preamble.
We wish you'd sign a short statement that roughly expresses a view that you share with many other people concerned with AI X-risk.
Why?
The primary reason is to raise awareness among the general population. The fact that many experts believe this (i.e. something akin to what Lucius stated) is a big deal. It would be completely astonishing to most people. The statement is aimed at these "most people". Its overarching objective is to be easy to read and understand by the general population, not containing all the nuance of all the possible views of relevant signatories.
In particular, there are certain groups/kinds of people we would especially want to be cognizant of this fact, such as policy makers and members of the general public who have not gone down the AI rabbit hole. Most people would bounce off text that is too dense/complicated/jargony.
We want to reduce the cost of spreading the knowledge of the fact of expert concern amongst the target audience[2], essentially as far as possible (within reasonable ethical boundaries). We want to create a short sentence in the language of (or, understandable to, with little inferential distance) the vast majority of people that points out this fact in a way that allows transmission of the knowledge of this fact.
This concise summary would be extremely useful for communicating legibly and concisely with other people. Put simply, the existence of that kind of summary would contribute to society's making sense of this fact right now.
Once you understand this goal and perspective, please visit aistatement.com and see the structure and function enabling normal people to quickly realize and transmit the knowledge of the astonishing fact that many experts, luminaries, etc., think artificial intelligence poses significant risks of literal extinction (per the CAIS statement, at least on par with the threat of a nuclear war).
Importantly, every bit of silence on the issue contributes, on the margin, to reinforcing society's belief that the fact is not true (especially given the various people shouting that AI is mere snake oil and all that matters is NVIDIA’s next quarter stock movement and other such nonsense[3]). Absence of Evidence Is Evidence of Absence, etc.
Even if the statement is only a flawed approximation of your view of the matter, as long as the statement doesn’t say something that you think is false[4], you can give your additional caveats publicly[5]. I (Ishual) would be willing to spend some effort to make it easier/more effective to have your and other people’s caveats better processed by at least part of the general population's collective mind, so as to give a view more accurate than what the actually legible thing can give (to say nothing of the incorrect default view).
However, we want to be extra clear given our previous post: Making an important fact legible is pure bonus points, and the reason to do it has nothing to do with any of the norms mentioned in that post. First, legibility requires cooperation amongst many people, and it is easier to cooperate on making your exact position with your exact emphasis than to cooperate despite having significant differences from “the centroid of the opinions of the cooperators.” Second, it is likely that these bonus points are up for grabs by lots of people who are not laboring within “the belly of the beast” (if I had to guess, there would be more outsiders than insiders who would want to grab the bonus points).
Regardless of whether you are already taking a public stance, agreeing to sign a single sentence is a powerful contribution to spreading the awareness that such a sentence is a fair (for some purpose, such as guiding collective action) approximation of the belief of a large group of people. The contribution is not only momentary (one more signature under the statement right now), but also cumulative: the more names under the signature, the more people who agree with the statement but are hesitant to sign for social reasons will be inclined to sign it in the end. Similarly, as the number of signatures grows, new people are being attracted to the issue and engage with the topic, some of them eventually becoming signatories of this or a similar statement. Given all of that, it is likely quite an effective use of your attention[6].
Admittedly, a single sentence with perhaps ambiguous words is not a good tool for thinking. However, its purpose is not to (directly) guide anybody's thinking. Rather, its purpose is to draw attention to an important issue, so that they can engage with other sentences on the issue that are better at guiding thinking. Obviously, a single sentence will not replace your actual position; you can (and perhaps should) always express it elsewhere. Indeed, the core of agreement among many perspectives is the point of such statements, and that's why coordinating to make them legible is worth our time. We think there is a place for experts to assess the situation in a decision-relevant way without entering the political ring as much as some people advocating for a position might; in a sense, your “expert power” only applies to a small part of statement space, and it makes sense to focus such a statement around our collective “expert power”.
We aren’t qualified to perfectly time a pause, and such a statement isn’t meant to be our opinion on when a pause ought to happen. We can assess a technical situation in a decision-relevant manner in a way that can be leveraged in conversations at any point in the future, and we can (if careful) contribute to sense-making without spending political capital. Sure, the experts don’t have the authority to time a pause, and they will never gain this authority, even if a big catastrophe happens. But they have the power to contribute to sense-making, which seems a necessary part of society metabolising a catastrophe into a net-positive change in the Overton window if and when it comes. Sense-making isn’t something you strategically turn on after a crisis; it is a necessary prerequisite for leveraging a crisis.
Finally, an expert doesn’t actually need to believe that sense-making will be effective; an expert doesn’t need to have an opinion about this at all (this is, in fact, not where their “expert power” lies). Refusing to speak one's mind because one thinks they’d be the only one is a very human tendency, but it is sometimes inappropriate. One doesn’t need to believe that most people will agree to get out of the burning house to decide to vote in favor of us getting out of the house[7].
Here is a striking example of how common knowledge dynamics can decide important group decisions, exemplifying both the danger of remaining silent and the power of speaking out: https://thezvi.substack.com/p/ai-moratorium-stripped-from-bbb
With the context/preamble/motivation out of the way, let's get to the actual objections raised to our statement. Some of them have already been answered in this section, but, for the sake of completeness, let's address them all one by one.
First, thank you. Being public is genuinely valuable and much better than remaining silent. However, adding your signature to a collective statement could greatly amplify the impact you're already having, with relatively little additional effort.[8][9]
(I.e., see the beginning of the previous section.)
You can always take a more nuanced public stance and also endorse a one-sentence summary. Alternatively, you could just endorse a one-sentence summary and let it be an improvement over silence, despite it not being literally the best possible thing you could do. It also requires way less effort from you than the best possible thing you could do on that front.
Creating a short statement that captures a large coalition's views inevitably involves trade-offs. It's genuinely difficult to craft language that includes everyone's preferred nuances while remaining brief enough for public communication.
Moreover, in practice, even asking a large group of people to sign something is quite costly,[10] and changing anything about the statement would require confirming that endorsement of the previous version of the statement carries over to the new version, for each signatory of the previous version.
This is an extremely common feeling, perhaps universal, among potential signatories. Nearly everyone has their own preferred framing, which is natural given the complexity of the issue and the diversity of perspectives. You can have an opinion that is expressed in the way you’d most prefer publicly. But to make an important fact legible, many people must cooperate. You still want the statement to be sufficiently concise and simply stated that a normal person will be able to cache it and recall it when needed.
Your most preferred wording is likely unique to you or shared by only a small group. More generally, there are two objectives at play here that we should think about separately. The first objective is for experts to develop a good understanding of the issue and bring this understanding with them when they attempt signing a statement. The second objective is for other people to understand the distribution of understandings among experts and to make some important aspects of it legible to normal people while accounting for all the constraints: understandability by most people, enough context, actual agreement, emphasis on the right things, acceptable ambiguity, memorable, etc.
We think this is a good reason not to sign.[11] However, to achieve the same level of common knowledge that you’d achieve by signing this flawed sentence[12], you may need to literally write a bestseller book, go on many podcasts, and acquire a public profile. This seems hard, to say the least. We hope we can develop better technology for establishing and refining common knowledge. If not, hopefully, there are at least enough people who share the same disagreement with the sentence such that you’d be able to find a less costly way to establish that.
We'd encourage reflecting on whether the specific disagreements genuinely outweigh the value of establishing common knowledge on the core point. Let us know what the issue is in case I figure out a way to make your idea legible in the future.[13]
This is a common desire, and understandably so, as your particular concern likely feels central to you. However, including it would likely make the statement less acceptable to others with different priorities. The legible thing you endorse need not contain the sum of all your important thoughts. Public perception matters. A statement signed by many people carries weight, while a more detailed statement with few signatures (however thoughtful) doesn't create the same common knowledge. See above about the two distinct objectives of crafting and legibilizing a better understanding of an issue on the one hand and broadcasting the action-relevant intersection of the views of the field on the other hand.
This is a fair concern. Policy advocacy does feel different from technical assessment. However, policymakers consistently report that they need clear signals from experts to justify taking difficult decisions[14]. However, I don't think inaction is a neutral choice here[15]. Given your role, your public stance may have an outsized impact.
People understand that there is no button politicians can push that would make a ban impossible to overturn. There is no such thing as a permanent ban, and the statement does not contain a reference to permanence. The statement's intent is to put in place a ban at least as hard to overturn as the ban on CFCs, cloning, or nuclear proliferation. Not impossibly hard.
Consider: Would you require a ban to be more easily reversed than existing precedents like the cloning ban, or is the standard precedent acceptable?
Every ban comes with the default mechanism to lift it, which is an actual scientific consensus and grassroots support on unbanning, leading to future people making a case for unbanning that succeeds. If you're concerned we'll never reach sufficient confidence to safely lift a ban, we'd argue that's actually a reason to support a ban. It suggests the problem is too hard for humanity to solve, or at least too hard to be the best path toward a good future. It might even suggest that the difficulty would never be surmounted[16].
Instead, you probably think we will eventually have the understanding such that it would at least benefit the developers/creators/growers of the ASI, and the worry is that we’d keep it banned somewhat (or far) past the time of an actual scientific consensus and grassroots support on unbanning. Examples: nuclear power, FDA. We agree that the default mechanism comes with added risks of delay, though the magnitude is uncertain.[17]
My (Ishual's) understanding of how this typically goes is that experts signal that a ban is a good idea, then some political leaders willing to advance legislation make something happen. The more permissive the signaling from the experts, the more likely it is that politicians succeed (i.e., this is one of many factors at play).
It seems to me that the statement as written is compatible with this position (because you are not committing to supporting every conceivable ban). I think that it is good to make some nuance legible, but it would be better said in an accompanying white paper/long video.
This way, you can push for your favorite kind of ban from a position where some ban is seriously considered or actually in vigor.
You could say that a ban that lasts (much) longer than you’d like (in (your) expectation) has a mixed effect on the odds of “beneficial ASI,” and it isn’t super clear that the net effect is to decrease it.[18]
Specifically, the claim is that people in power / relevant decision-makers have a stake in the game and therefore will be unwilling to coordinate on an effective ban or even on an intervention that has a credible effect of decreasing the probability of building ASI too early.
However, one could have made the same argument about a bunch of "too good to pass technologies" in the past: nuclear weapons, bio/chemical weapons, human cloning, human genome engineering, nuclear energy[19].
As faul_sname brought up recently:
Von Neumann was, at the time, a strong supporter of "preventive war." Confident even during World War II that the Russian spy network had obtained many of the details of the atom bomb design, Von Neumann knew that it was only a matter of time before the Soviet Union became a nuclear power. He predicted that were Russia allowed to build a nuclear arsenal, a war against the U.S. would be inevitable. He therefore recommended that the U.S. launch a nuclear strike at Moscow, destroying its enemy and becoming a dominant world power, so as to avoid a more destructive nuclear war later on. "With the Russians it is not a question of whether but of when," he would say. An oft-quoted remark of his is, "If you say why not bomb them tomorrow, I say why not today? If you say today at 5 o'clock, I say why not one o'clock?"
… and yet, we did not.
Funny but ultimately irrelevant thought experiment warning
Similarly, isn't cloning "too good to pass up" as well? Imagine 1M clones of Von Neumann raised to deeply care about the national interests of the USA and to cooperate with each other. Clearly, there are versions of cloning that could have either lost control of the world or put the USA (or whoever gets the loyalty of the clones) in a position to dominate the rest of the world economically (a lot more than it already has). Indeed, one might call this a form of mild superintelligence, and yet humanity took a hard pass on that (at least for now).
We agree that for an international ban to make sense, it has to be enforced on everyone, including people who keep insisting that superintelligence is just too good to pass up, never mind the risks.
This is a fair preference, and there are ways to accommodate it better or worse. For instance, maybe you’d prefer not to be out there alone. In that case, it would help to specify what conditions would make you comfortable – particular individuals signing alongside you, or a certain threshold number of signatures. If public visibility is a concern, consider conditional commitments (e.g. sign after ≥N peers sign).
This is a fair point. Presenting statements probably requires thinking carefully about how the information is shown. Pretending to be an expert if you are not one would be bad, so if the statement actually aims to signal something, it should aim to signal something legible, e.g., “person so-and-so has held a role so-and-so in this space” or something like that. You may be underestimating the credentialing value of whatever role you've held.
This is likely true, but our unique strength lies in very strong arguments. We think the kind of fact we suggest coordinating around making legible (namely, "we need to legislatively prevent the development of ASI/AGI/existentially risky sort of AI") is overdetermined, such that collectively we know of a stronger case for it than any one of us could articulate individually.
We don’t think that signing a statement now means we would diminish our political capital. Some worry about reputational[20] attacks from signing this statement. Do people imagine that there would be an attack on our good name that leverages the signing of this statement? If more (hostile) attention is paid to the statement, we have strong reasons and arguments for holding our position, and we have some eloquent signatories to provide those arguments. We want to push back on the idea that silence preserves capital for later. Inaction has costs: there are well-resourced actors actively working to muddy public understanding of AI risks through repetition, misrepresentation, and flooding the discourse with noise[21][22], all of it aiming to bias society more strongly towards inaction and confusion on the issue of AI X-risk. If you remain silent, the world will get more confused, and it will become harder for the world to react well[23], even if at some point we encounter some “catastrophe” or "crisis".
Moreover, some people who share your view are speaking out already, whether you join them or not. However, if you join them, they will be more likely to pierce through the noise and confusion. The choice whether to speak out made by people like you is one of the things determining whether they break through or whether the debate gets anchored by more salient, repeated messages. We are in a stag hunt situation. It might feel safe to save yourself for later, but it isn’t. If others speak out without you and fail to break through, the window may close. Waiting for a "better moment" may mean missing the moment entirely. If you are worried about this, there are solutions you can use to speak out iff sufficiently many others also would (e.g. conditional commitment to speak out).
Essentially, I’d want to remind people that others will lie about the situation and say that the Ban Superintelligence statement is a fringe view even in worlds where the majority of experts with even some small amount of freedom from incentives would endorse something quite similar (if this majority doesn’t create a one-sentence counter that people can use to expose the lie).
Suppose that it is true that the pivotal point of action would occur just after a "catastrophe" or during some "crisis". For example, the Milton Friedman model of policy change says that "most policy change outside a prior Overton Window comes about by policy advocates skillfully exploiting a crisis".
However, effective crisis response requires groundwork. If we haven't built common knowledge beforehand, we should be skeptical that the world will react well even to a clear catastrophe in a domain as unprecedented/strange as AI risk. Also, I (Ishual) note that the CAIS statement is quite useful in conversations even 2 years after it was signed. There is no need to “perfectly time” this.
In short: To skillfully exploit a crisis, one needs to do a lot of prep. One important aspect of such prep is building common knowledge. A simple statement with many important signatories is an effective tool to build common knowledge.
If we missed your actual objection, then please help us make the list more useful by commenting.
If your objections have been addressed, but you still don't feel energized to do something about it, then please consider the following positive vision.
Many of us hope for a better world. I think even in the mundane realms of a world technologically like ours, so much more is possible. Functional institutions are possible[24]. Incremental progress on this is possible. We can, in fact, unilaterally (as a loose set of people who at least occasionally visit Less Wrong) make a part of the system work better (by creating this short sentence in the language of humanity for pointing at an important fact, as a first step). We can then either hope or optimize for other parts of the system to leverage this tool to have better conversations. But regardless of how long it takes for other parts of the system to do their parts, we will have enabled their success. Even if you think that there is a non-negligible chance that a functional room containing 10 people is sufficient to save the world, surely you agree it would be better, more likely to succeed, if a greater part of the world was functional.
If you want to take this first step towards a better world, my (Ishual's) DMs are open :)
And if somehow you glimpse the greater project I am extremely vaguely gesturing at, and you'd maybe want to take a couple more steps beyond the first, my DMs are also open :D
Researchers who have enough experience with the various problems of making AI safe are our primary audience here, but despite our previous post, we are now addressing all such researchers whatever they happen to be doing now, and indeed whether or not they have already taken a public stance.
Policymakers and the public won't reconstruct expert sentiment from forum posts. And even if one/some of them did, they wouldn't have a concise summary of the action-relevant intersection of beliefs of a plurality of relevant researchers.
Mechanistically, they mostly just don’t think about this fact explicitly and they have various heuristics in play that result in not engaging with the broader discussion, or of not feeling like they have to do anything outside their autopilot on the issue. If they get concerned, they might want to spend a bit more money on some charismatic expert working on a technical solution. Since there is limited time to cross the gap between most people’s intuitions and a sane view of the race toward superintelligence, the sheer implausibility of the true fact that more than a few cranks believe this ends up quite costly. We could (sort of) model this cost (either the time to actually make the fact seem less implausible with some tiny; bits of public evidence or the drag on the whole discussion and the person having it with them) as “this person taking your failure to make this fact legible as evidence of the fact not being true.
The sentence “the sky is blue” is not strictly true, but it does stand for a statement that eg the sky isn’t more green than blue, and if we lived in a world where many powerful and wealthy forces were trying to prevent the person on the street from understanding that the sky was blue (by claiming it was green), then I guess I’d say it is “true enough”, and being silent about it (or just not achieving coordination among experts to say a simple sentence that makes sense to people) is “false enough”.
Unless some force is somehow preventing you from doing so somehow.
Even somewhat competitive with much more costly efforts to make your opinion public, definitely in expectation better than going on a podcast that 10K people will hear.
If you are so worried about us finding out that no one wants to exit the house (despite you and many others actually secretly wanting to get out), you can just say you don’t think we should make the vote public unless the result is non-embarrassing (or you can just vote and not care much about the outcome). Pluralistic ignorance is real: a majority can believe X while mistakenly thinking everyone else believes "not X," leading to collective silence. You don't need to assess this alone, that's precisely what collective action helps reveal. Even if support seems limited now, coordination can change the landscape.
Unless you are really loud and constantly repeating your central position in a way that actually reaches lots of people, so many that your signature on the statement would not have significant additional impact, in which case carry on, you are outside the target audience :)
Moreover, consider a reversal test: if someone compiled a list of people who've expressed support for a ban based on public statements, would you want your name removed? If not, then making that support explicit seems consistent with your actual position.
In terms of time, effort, not necessarily monetary costs.
Likewise, this is a good reason to keep our statement relatively simple.
Let's assume that there is some chance that 100-300 people sign. This would have very large impacts. The impact is not confined to the moment one signs or to the moment when the statement goes public. Once the statement becomes a short conversational move, it will be used in conversations about AI-caused X-risks many times, and then these signatures will boost the effectiveness of these conversations. The CAIS statement is likely the single most effective sentence when I (Ishual) speak to people about AI-caused X-risks.
[made up numbers warning:] To compare the impact of a single signature to some other intervention, we naively just divide by 100-300, and still get some quite large impact. Even if there isn't much difference in effectiveness between 100 and 300 signatures, there is still plenty of marginal impact of a single signature, if we wish that the sum of marginal expected utils equal the expected utils of the whole, because early signatures also makes it easier for others to sign. If we kept insisting, we might naively expect that the distribution of outcomes would be "bimodal" between less than ~100 and more than ~300, given the proportion of "agree and will sign" to "agree and won't sign," and naively assuming there are only really 300 serious experts.
An alternative way to do it would be major public outreach (books, podcasts, etc.), though that's far costlier.
A policy maker once noted to me (Ishual) that the CAIS statement could be read ambiguously as it didn't clearly signal that experts favor international cooperation to prevent rogue actors from building extinction-causing superintelligence. Indeed one plausible reading of the CAIS statement is that experts want more funding to work on their thing, or simply that this is why they do what they do, which is totally gonna mitigate those risks.
It would be strange to “blame” a class of people with the membership to the class being that they are considered experts on a topic, but nevertheless regarding that class of people, we think silence is quite bad, public-only stance is good, and the efforts you put into making important stuff legible is very good (bonus points). Assuming you already take a public stance, you are (merely) leaving lots of extra value on the table if you don’t make sufficient efforts to achieve legibility.
either because the problem would remain intractable forever or because we'd be wiser to first achieve a good future some other way and then revisit the problem
Seems like there is a small but extremely passionate group of people who really want this tech even now when it would be foolish to build.
Maybe. Maybe bans are super sticky even in futures that get their shit together enough to “solve alignment” in which case my “true objection” would be that on the margin you’d have to delay ASI a lot in order to be worth even 1% more risk of actual extinction (and also squandering of the lightcone).
This list was deliberately made so as to evoke negative, mixed, and positive feelings in a large fraction of the article's intended audience.
Either attacking their own reputation, or that of the whole "safety community".
Or in some cases, that the danger is non-existent.
Or in some cases, a picture in opposition to reality
We encounter various reactions that amount to doing nothing. You might not see a catastrophe now, but many people do see e.g. Trump shenanigans as a clear sign that AI will not be handled well. But then they just convince themselves that lying down and dying is the totality of their options.
A humanity that works a lot more for the benefit of humans is possible. Indeed, actually making huge progress here seems much easier than creating a superintelligence that deeply cares for us the way we'd want it to care for us on the first try. So much needs to be said about this and yet it will have to wait for another post.