- A convincing scenario cannot involve any bioweapons. Normal people just don’t know how vulnerable the human machine is. They think pandemics are just something that happens every 5-20 years, and don’t think about it besides that. They don’t think about the human body as a nano factory that’s vulnerable to targetted nano-attacks.
- A scenario that passes the mom test will also not include any drones. Yes, even though drones are currently used in warfare. Drones are the future. Drones are toys. Futuristic toys don’t sound like a realistic threat.
- A mom test scenario also shouldn’t involve any hacking. Regular people have no idea how insecure computer systems are. It’s basically safe to do online banking on a computer, which gives people the intuition that computers are mostly secure. Any story involving hacking violates that intuition.
I think there is a model of "normies" employed here which is a decent first approximation, but isn't precise enough. I think for instance that all three above things can be make real to normies. I feel like the "decent first approximation" model of normies here present an "impossible" (in video game sense) problem. But the real problem is merely hard.
Here are some diffs between my and your model of normies:
1) I think most normies can be convinced that things that actually happened happened, even if the things sound very sci-fi/weird. It is worth thinking how to do this well. It isn't always easy, but this is a great sanity check, can you actually convince normies of things that already happened? (the answer is yes, you just have to become good at it) In particular, if you can't convince most normies that drones are a major factor in the ukraine war, if you can't convince most normies that stuxnet and various other hacking ops was super impressive and scary, if you can't convince most normies that the black death killed a lot of people and disrupted society for years, then you are making mistakes that take like 1 month of deliberate effort to fix at most.
2) what presents naively as skepticism of sci-fi stuff is actually something else. Specifically, I think most people (including normies) have a default where everything they see "makes sense" to them, and if they reflect about it, they have some explanation for the evidence they see. If someone has a pretty bad explanation for some evidence, your argument will not make sense to them. If you identify these places where the explanation for the evidence is misleading, you can dig there and find a different example which will have a different explanation, or use some other method to make your argument flow more explicitly than if you were making it to someone who had shared models for the observation you present.
3) normies (and most non-normies, but that is a different post lol) don't run on arguments. They have intuitions that activate when you present some idea and that determines whether they agree or not. People are bad at verbalizing what the intuition is that is triggering, and there isn't a direct mapping between what the intuition is and what they say when it triggers. An example of this is someone being skeptical that AI would be lethal, asking "is elon gonna program it to kill all <insert ingroup>s?" and then I said "AI is not programmed line-by-line, we just tweak it through trial and error until it achieves some measurable capabilities" and then they were like "holy shit, now I get it". It is not obvious (at all) a priori that "I am skeptical that AI could hurt us" was linked to the "AI is just programmed software" intuition. And normies differ immensely in which intuitions they have. My default guess is that if it appears to you like "you must never mention X to normies if you want to convince them", then probably you are faced with some intuitions that you have misidentified.
4) normies are a social species and implement a social epistemology. It isn't fully accurate to say that normies don't change their minds in isolation, but it does capture some real thing, which is that there is huge variance in how their intuitions work and they mostly change their minds as a group. If you have some kind of convincing short story that they feel they could repeat and justify-themselves/convince others with, then that will do most of the work. If you give them that first, then fully convincing them through "logic and facts" may just go smoother. Don't neglect the 5 minute version of what you are saying. the 5 minute version mostly needs to be plausible and defensible. Also, social proof is important, finding someone they respect to acknowledge a topic as valid does a lot of work.
idea to make a thing that happened believable:
- stuxnet has a wikipedia page that is easy to point to. Pointing to things as a reflex sometimes works.
- there are perhaps some videos/books that are aimed at the general public, are somewhat entertaining for the general public, and have a visible view count of >1M.
- Maybe it was covered by one of the news channels that the general public person acknowledges as existing and a valid source of real things.
I think for the "scary hacks" category, it is worth coming up with 3-5 very different illustrative cases, and looking for ways to connect them to other things the person (likely) thinks are real.
I think it is worth doing the same with pandemics. (for instance there was the black death, there was the spanish flu, there was covid, which (depending on the politics of the person) was acknowledged as having been engineered)
I think cases of "going hard" by human groups are worth knowing about.
I think if some such cases are very useful for making the case that ASI could win in a fight against humanity, it would be worth first getting really good at establishing and discussing many such cases in a fun and believable way, and then once you succeed at enough of them to establish their existence, you can talk about how an ASI could pull these human-pullable levers.
I think that signing sufficiently clear open letters/similar things seems sufficient to count as "taking a public stance".
[Here is a first rough attempt at expressing an idea:]
I don't think pushing for an AI pause now is what most people have in mind (definitely it doesn't match what I had in mind when writing the post, keeping in mind that the post isn't about what kind of public stance it would be effective for experts (inside and outside the frontier labs) to take). Instead, what matters imo is to have a legible thing (eg open letter/statement) that says clearly that you think a coordination to stop the current race dynamic would be good if feasible (because the current race has serious risks of extinction/disempowerment). Let me try to make the distinction (between that and the pause letter) clearer:
The CAIS statement "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war." clearly establishes that extinction from AI is a thing many experts worry about. It was signed in 2023. It didn't cry wolf. It is still useful today (probably more useful today than when it came out). If you need to convince a policy maker or a member of the general public that extinction is not a fringe concern, there is a clear statement you can quote, a simple link you can share, and a list of names that speaks for itself. In practice, this info is evergreen I think.
When a policy maker considers whether they should pay attention or discuss a "pause", they have their concerns, and they are capable of worrying about whether this is "crying wolf" without our help. But maybe they wonder "would experts actually oppose me if I spoke about this?" or "do experts actually think some international coordination is required?" especially if they have experience with international coordination, I think they will suspect it is very hard to pull off, so they might just think that pushing for alignment project X to get marginally more funding is a better use of their time.
> I signed the FLI pause letter in 2023, but looking back it seems a bit premature.
I basically agree. But expressing the fact that plan A would be (a lot) better than plan B (along with some clarity on why) is useful even if it doesn't cause an immediate shift from plan B to plan A. There is a strong sense in which expressing "plan A >> plan B, because xrisks" is a lot more your job as an expert than timing a pause. In other words, I think experts should make future cooperation easier rather than try to unilaterally create a shelling point.
if we are fully ignoring the internal pressures to not say such things within frontier labs, and purely focus on the efficacy of taking a public stance, I think there is a way to take a public stance without incurring the cost you mention above.
If in the future there is some serious discussion toward an international agreement of some kind, my guess is that it will have been made possible/[much easier] by individual people clearly and credibly expressing that they'd be on board with implementing such an agreement, and I think experts could be helpful here (by clearly expressing something like "plan A >> plan B, because xrisks").
This sounds to me like there would actually be specific opportunities to express some of your true beliefs that you wouldn't worry would cost you a lot (and some other opportunities where you would worry and not do them). Would you agree with that?
I agree that people should clearly state that they think there's a catastrophic risk, but I disagree that people should clearly state that they think we should pause.
If we premise (as this post does) on the fact that the person we are talking about actually believes that an international ban would be a great improvement over the current mad AI race, then the above quote seems wrong to me.
I agree that experts should not pretend like they have more authority than they do in judging whether we should pause. But they could still say 1) that the race is insane, 2) that an international ban seems like a great improvement, 3) that if such a ban was proposed, they would not oppose it and 4) they would in fact support it. If not the experts, then who? To be clear, I don't think the experts within the lab racing to build the tech are necessary here (this is not what the post is about). There are experts outside of the lab also (and they don't have the [huge conflicts of interest]/pressure to filter(/falsify?) their speech). But if not the experts, then who would be better placed to say the above? if there is no one to say it, how does it get understood? if it doesn't get understood, coordination to actually move out of the status quo towards some kind of international agreement is much harder. The CEOs of some of the lab could say it and that would definitely have an impact, but will they (lol)? Politicians could say it, but probably the backing of many experts would make this much easier for the politicians to say.
I think "there are catastrophic risks" is way too weak and doesn't substitute. Partly because "there are catastrophic risk, so please give more money to me/so put me in charge/so we must beat those less careful folks" are also possible readings. I also happen to have it on very good authority that some politicians, when informed that many experts recognize the risks of extinctions and told the reasons why we should stop the mad AI race, will ask "but do the experts support stopping?" with perhaps a side of ("or do they just want more money for their thing?")
I probably should have said "norm execution" (ie follow the norm). This might just be a cultural gap, but I think norm enforcement/execution/implementation works in many ways that are not threats. For instance, there is pizza at a conference. there is a norm that you shouldn't take all the pizza if there is a big line behind you. some people break this norm. what happens? do they get threatened? no! they just get dirty looks and people talking behind their backs. maybe they get the reputation as the "pizza taker". In fact, nobody necessarily told them before this happened that taking all the pizza would break the norm.
I think there is a strange presumption that one is owed my and others' maximum respect and friendship. anything less than that would be a "punishment". that is pretty strange. if I have money in my pocket but I will only give some to you based on how many "good deeds" I have seen you do, this is not a threat. I guess that if you did not understand the motives or if the motives were actually to get a specific person to do more "good deeds" (by telling them in advance what the reward would be), you could call it a bribe. but calling it a threat is obviously incorrect.
I think norm enforcement/execution/implementation can and is in my case motivated by an aesthetic preference for "points" that are person A to give such as respect and friendship 1) not go to someone who does not deserve them (in my eyes) and instead 2) go to someone who does deserve them. It is not primarily driven by a consequentialist desire for more people to do respect-and-friendship-deserving things. It is primarily driven by a desire for the points to match reality, and thus enable greater cooperation and further good things down the line.
I realized based on a few comments that the three norms I discuss in the post were seen by some as like one giant strategy to produce more public stances from safety researchers. This is not the case. I am just talking to three different audiences and I explain a norm that I think makes sense (independently) for them.
(optional: my other comment is more important imo)
I'm not concerned about someone being fired for this kind of thing, that would be pretty unwise on the labs' part as you risk creating a martyr
I think you ascribe too much competence/foresight/focus/care to the labs. I'd be willing to bet that multiple (safety?) people have been fired from labs in a way that would make the lab look pretty bad. Labs make tactical mistakes sometimes. Wasn't there a thing at OpenAI for instance (lol)? Of course it is possible(/probable?) that they would not fire in a given case due to sufficient "wisdom", but we should not assign an extreme likelihood to that.
Rather, I'm concerned about eg senior figures thinking worse of safety researchers as a whole because it causes a PR headache, eg viewing them as radical troublemakers, and this making theories of impact around influencing specific senior decision makers harder (and I'm more optimistic about those, personally)
Thank you Neel for stating this explicitly. I think this is very valuable information. This matches what some of my friends told me privately also. I would appreciate it a lot if you could give a rough estimate of your confidence that this would happen (ideally some probability/percentage). Additionally, I would appreciate if you could say whether you'd expect such a consequence to be legible/visible or illegible (once it had happened). Finally, are there legible reasons you could share for your estimated credence that this would happen?
(to be clear: I am sad that you are operating under such conditions. I consider this evidence against expecting meaningful impact from the inside at your lab.)
I conclude from this that you really do see this post as a threat (also you admitted there is no threat in your first comment so this comment now seems contradictory/bad-faith).
some thoughts:
- this isn't a threat by proxy and isn't a threat (but if it were a tbp then it would be a t sure)
- I am in the "others" group. I implement the norm I endorse in the post, and I am not threatening you. I don't want to sound dismissive but you are not giving me a lot to work with here, and it sounds to me like either 1) you have a vague model of what a threat is that includes things that aren't threats or 2) you are misunderstanding the post and our intent such that you model us as having made a threat.
I think your advice is a quite clear articulation of the strategy behind the CAIS statement. I think this is a great and difficult strategy to pursue. Indeed, I have been circulating https://superintelligence-statement.org/ for some time now (secretly), and I think this statement first ensured it would get some of the major signatories before even making its way to me.
I think the OP is both talking about object-level reasons people gave, and it also shows that there is another strategy one might pursue which is to go around and convince people to actually support an existing statement publicly once they support it privately.