Wiki Contributions



There are all kinds of benefits to acting with good faith, and people should not feel licensed to abandon good faith dialogue just because they're SUPER confident and this issue is REALLY IMPORTANT. 

When something is really serious it becomes even more important to do boring +EV things like "remember that you can be wrong sometimes" and "don't take people's quotes out of context, misrepresent their position, and run smear campaigns on them; and definitely don't make that your primary contribution to the conversation".

Like, for Connor & people who support him (not saying this is you Ben): don't you think it's a little bit suspicious that you ended up in a place where you concluded that the very best use of your time in helping with AI risk was tweet-dunking and infighting among the AI safety community? 


I don't expect most people to agree with that point, but I do believe it. It ends up depending on a lot of premises, so expanding on my view there in full would be a whole post of its own. But to try to give a short version: 

There are a lot of specific reasons I think having people working in AI capabilities is so strongly +EV. But I don't expect people to agree with those specific views. The reason I think it's obvious is that even when I make massive concessions to the anti-capabilities people, these organizations... still seem +EV? Let's make a bunch of concessions:

1. Alignment will be solved by theoretical work unrelated to capabilities. It can be done just as well at an alignment-only organization with limited funding as it can at a major AGI org with far more funding. 

2. If alignment is solved, that automatically means future ASI will be built using this alignment technique, regardless of whether leading AI orgs actually care about alignment at all. You just publish a paper saying "alignment solution, pls use this Meta" and Meta will definitely do it.

3. Alignment will take a significant amount of time - probably decades. 

4. ASI is now imminent; these orgs have reduced timelines to ASI by 1-5 years.

5. Our best chance of survival is a total stop, which none of the CEOs of these orgs support.

Even given all five of these premises... Demis Hassabis, Dario Amodei, and Sam Altman have all increased the chance of a total stop, by a lot. By more than almost anyone else on the planet, in fact. Yes, even though they don't think it's a good idea right now and have said as much (I think? haven't followed all of their statements on AI pause). 

That is, the chance of a total stop is clearly higher in this world than in the counterfactual one where any of Demis/Dario/Sam didn't go into AI capabilities, because a CEO of a leading AI organization saying "yeah I think AI could maybe kill us all" is something that by default would not happen. As I said before, most people in the field of AI don't take AI risk seriously; this was even more true back when they first entered the field. The default scenario is one where people at NVIDIA and Google Brain and Meta are reassuring the public that AI risk isn't real.

So in other words, they are still increasing our chances of survival, even under that incredibly uncharitable set of assumptions.

 Of course, you could cook these assumptions even more in order to make them -EV - if you think that a total stop isn't feasible, but still believe all of the other four premises, then they're -EV. Or you could say "yeah, we need a total stop now, because they've advanced timelines, but if these orgs didn't exist then we totally would have solved alignment before Meta made a big transformer model and trained it on a lot of text; so even though they've raised the chances of a total stop they're still a net negative." Or you could say "the real counterfactual about Sam Altman isn't if he didn't enter the field. The real counterfactual is the one where he totally agreed with all of my incredibly specific views and acted based on those."

I.e. if you're looking for excuses to be allowed to believe that these orgs are bad, you'll find them. But that's always the case. Under real worldviews - even under Connor's worldview, where he thinks a total stop is both plausible and necessary - OAI/DM/Anthropic are all helping with AI risk. Which means that their beneficiality is incredibly robust, because again, I think many of the assumptions I outlined above are false & incredibly uncharitable to AGI orgs.


Yeah, fair enough.

But I don't think that would be a sensible position. The correct counterfactual is in fact the one where Google Brain, Meta, and NVIDIA led the field. Like, if DM + OpenAI + Anthropic didn't exist - something he has publicly wished for - that is in fact the most likely situation we would find. We certainly wouldn't find CEOs who advocate for a total stop on AI.


(Ninth, I am aware of the irony of calling for more civil discourse in a highly inflammatory comment. Mea culpa)


I believe you're wrong on your model of AI risk and you have abandoned the niceness/civilization norms that act to protect you from the downside of having false beliefs and help you navigate your way out of them. When people explain why they disagree with you, you accuse them of lying for personal gain rather than introspect about their arguments deeply enough to get your way out of the hole you're in.

First, this is a minor point where you're wrong, but it's also a sufficiently obvious point that it should hopefully  make clear how wrong your world model is: AI safety community in general, and DeepMind + Anthropic + OpenAI in particular, have all made your job FAR easier. This should be extremely obvious upon reflection, so I'd like you to ask yourself how on earth you ever thought otherwise. CEOs of leading AI companies publicly acknowledging AI risk has been absolutely massive for public awareness of AI risk and its credibility. You regularly bring up how CEOs of leading AI companies acknowledge AI risk as a talking point, so I'd hope that on some level you're aware that your success in public advocacy would be massively reduced in the counterfactual case where the leading AI orgs are Google Brain, Meta, and NVIDIA, and their leaders were saying "AI risk? Sounds like sci-fi nonsense!" 

The fact that people disagree with your preferred method of reducing AI risk does not mean that they are EVIL LIARS who are MAKING YOUR JOB HARDER and DOOMING US ALL.

Second, the reason that a total stop is portrayed as an extreme position is because it is. You can think a total stop is correct while acknowledging that it is obviously an extreme course of action that would require TREMENDOUS international co-ordination and would have to last across multiple different governments. You would need both Republicans and Democrats in America behind it, because both will be in power across the duration of your indefinite stop, and ditto for the leadership of every other country. It would require military action to be taken against people who violate the agreement. This total stop would not just impact AI, because you would need insanely strong regulations on compute - it would impact everyone's day to day life. The level of compute you'd have to restrict would only escalate as time went on due to Moore's law. And you and others talk about carrying this on for decades. This is an incredibly extreme position that requires pretty much everyone in the world to agree AI risk is both real and imminent, which they don't. Leading to...

Third: most people - both AI researchers and the general public - are not seriously concerned about AI risk. No, I don't believe your handful of sketchy polls. On the research side, whether it's on the machine learning subreddit, on ML specific discords, or within Yoshua Bengio's own research organization[1], the consensus in any area that isn't specifically selected for worrying about AI risk is always that it's not a serious concern. And on the public side, hopefully everyone realizes that awareness & agreement on AI risk is far below where climate change is.

Your advocacy regularly assumes that there is a broad consensus among both researchers and the public that AI risk is a serious concern. Which makes sense because this is the only way you can think a total stop is at all plausible. But bad news: there is nowhere close to such a consensus. And if you think developing one is important, you should wake up every morning & end every day praising Sam Altman, Dario Amodei, and Demis Hassabis for raising the profile of AI risk to such an extent; but instead you attack them, out of a misguided belief that somehow, if not for them, AI progress wouldn't happen.

Which leads us to number four: No, you can't get a total stop on AI progress through individual withdrawal. You and others in the stop AI movement regularly use the premise that if only OpenAI + Anthropic + DeepMind would just stop, AI would never get developed and we could all live happily ever after, so therefore they are KILLING US ALL. 

This is false. Actually, there are many people and organizations that do not believe AI risk is a serious concern and only see AI as a technology with massive potential economic benefits; as long as this is the case, AI progress will continue. This is not a prisoner's dilemma where if only all the people worried about AI risk would "co-operate" (by ceasing AI work) AI would stop. Even if they all stopped tomorrow, progress would continue.

If you want to say they should stop anyway because that would slow timelines, I would like to point out that that is completely different from a total stop and cannot be justified by praising the virtues of a total stop. Moreover, it has the absolutely massive drawback that now AI is getting built by a group of people who were selected for not caring about AI risk.

Advocating for individual withdrawal by talking about how good a total, globally agreed upon stop would be is deceptive - or, if I wanted to use your phrasing, I could say that doing so is LYING, presumably FOR PERSONAL GAIN and you're going to GET US ALL KILLED you EVIL PERSON. Or I guess I could just not do all that and just explain why I disagree with you - I wonder which method is better?

Fifth, you can't get a total stop on AI progress at all, and that's why no one will advocate for one. This follows from points two and three and four. Even if somehow everyone agreed that AI risk was a serious issue a total stop would still not happen the same way that people believing in climate change did not cause us to abandon gasoline.

Sixth, if you want to advocate for a total stop, that's your prerogative, but you don't get to choose that that's the only way. In theory there is nothing wrong with advocating for a total stop even though it is completely doomed. After all, nothing will come of it and maybe you'll raise awareness of AI risk while you're doing it. 

The problem is that you are dead set on torching other alignment plans to the ground all for the sake of your nonworkable idea. Obviously you are going after AI capabilities people all the time but here you are also going against people who simply advocate for positions less stringent than you. Everyone needs to fall in line and advocate for your particular line of action that will never happen and if they don't they are liars and going to kill us all. This is where your abdication from normal conversational norms makes your wrong beliefs actively harmful.

Leading to point number seven, we should talk about AI risk without constantly accusing each other of killing us all. What? But if I believe Connor's actions are bad for AI risk surely that means I should be honest and say he's killing us all, right? No, the same conversational norms that work for discussing a tax reform apply just as much here. You're more likely to get a good tax reform if you talk it out in a civil manner, and the same goes for AI risk. I reject the idea that being hysterical and making drastic accusations actually helps things, I reject the idea that the long term thinking and planning that works best for literally every other issue suddenly has to be abandoned in AI risk because the stakes are so high, I reject the idea that the only possible solution is paralysis.

Eighth, yes, working in AI capabilities is absolutely a reasonable alignment plan that raises odds of success immensely. I know, you're so overconfident on this point that even reading this will trigger you to dismiss my comment. And yet it's still true - and what's more, obviously so. I don't know how you and others egged each other into the position that it doesn't matter whether the people working on AI care about AI risk, but it's insane.

  1. ^

    From a recent interview

    D’Agostino: How did your colleagues at Mila react to your reckoning about your life’s work?

    Bengio:The most frequent reaction here at Mila was from people who were mostly worried about the current harms of AI—issues related to discrimination and human rights. They were afraid that talking about these future, science-fiction-sounding risks would detract from the discussion of the injustice that is going on—the concentration of power and the lack of diversity and of voice for minorities or people in other countries that are on the receiving end of whatever we do.


This post is fun but I think it's worth pointing out that basically nothing in it is true.

-"Clown attacks" are not a common or particularly effective form of persuasion
-They are certainly not a zero day exploit; having a low status person say X because you don't want people to believe X has been available to humans for our entire evolutionary history
-Zero day exploits in general are not a thing you have to worry about; it isn't an analogy that applies to humans because we're far more robust than software. A zero day exploit on an operating system can give you total control of it; a 'zero day exploit' like junk food can make you consume 5% more calories per day than you otherwise would.
-AI companies have not devoted significant effort to human thought steering, unless you mean "try to drive engagement on a social media website"; they are too busy working on AI.
-AI companies are not going to try to weaponize "human thought steering" against AI safety
-Reading the sequences wouldn't protect you from mind control if it did exist
-Attempts at manipulation certainly do exist but it will mostly be mass manipulation aimed at driving engagement and selling you things based off of your browser history, rather than a nefarious actor targeting AI safety in particular


Seems like we mostly agree and our difference is based on timelines. I agree the effect is more of a long term one, although I wouldn't say decades. OpenAI was founded in 2015 and raised the profile of AI risk in 2022, so in the counterfactual case where Sam Altman was dissuaded from founding OpenAI due to timeline concerns, AI risk would have much lower public credibility less than a decade.

Public recognition as a researcher does seem to favour longer periods of time though, the biggest names are all people who've been in the field multiple decades, so you have a point there.


I think we're talking past each other a bit. I'm saying that people sympathetic to AI risk will be discouraged from publishing AI capability work, and publishing AI capability work is exactly why Stuart Russell and Yoshua Bengio have credibility. Because publishing AI capability work is so strongly discouraged, any new professors of AI will to some degree be selected for not caring about AI risk, which was not the case when Russell or Bengio entered the field.


The focus of the piece is on the cost of various methods taken to slow down AI timelines, with the thesis being that across a wide variety of different beliefs about the merit of slowing down AI, these costs aren't worth it. I don't think it's confused to be agnostic about the merits of slowing down AI when the tradeoffs being taken are this bad. 

Views on the merit of slowing down AI will be highly variable from person to person and will depend on a lot of extremely difficult and debatable premises that are nevertheless easy to have an opinion on. There is a place for debating all of those various premises and trying to nail down what exactly the benefit is of slowing down AI; but there is also a place for saying "hey, stop getting pulled in by that bike-shed and notice how these tradeoffs being taken are not worth it given pretty much any view on the benefit of slowing down AI".

> I think you’re confused about the perspective that you’re trying to argue against. Lots of people are very confident, including “when pressed”, that we’d probably be in a much better place right now if the big AGI labs (especially OpenAI) had never been founded. You can disagree, but you shouldn’t put words in people’s mouths.

I was speaking from experience, having seen this dynamic play out multiple times. But yes, I'm aware that others are extremely confident in all kinds of specific and shaky premises.


> I just think it’s extraordinarily important to be doing things on a case-by-case basis here. Like, let’s say I want to work at OpenAI, with the idea that I’m going to advocate for safety-promoting causes, and take actions that are minimally bad for timelines. 

Notice that this is phrasing AI safety and AI timelines as two equal concerns that are worth trading off against each other. I don't think they are equal, and I think most people would have far better impact if they completely struck "I'm worried this will advance timelines" from their thinking and instead focused solely on "how can I make AI risk better".

I considered talking about why I think this is the case psychologically, but for the piece I felt it was more productive to focus on the object level arguments for why the tradeoffs people are making are bad. But to go into the psychological component a bit:

-Loss aversion: The fear of making AI risk worse is greater than the joy of making it better.

-Status quo bias: Doing something, especially something like working on AI capabilities, is seen as giving you responsibility for the problem. We see this with rhetoric against AGI labs - many in the alignment community will level terrible accusations against them, all while having to admit when pressed that it is plausible they are making AI risk better.

-Fear undermining probability estimates: I don't know if there's a catchy phrase for this one but I think it's real. The impacts of any actions you take will be very muddy, indirect, and uncertain, because this is a big, long term problem. When you are afraid, this makes you view uncertain positive impacts with suspicion and makes you see uncertain negative impacts as more likely. So people doubt tenuous contributions to AI safety like "AI capability researchers worried about AI risk lend credibility to the problem, thereby making AI risk better", but view tenuous contributions to AI risk like "you publish a capabilities paper, thereby speeding up timelines, making AI risk worse" as plausible.

Load More