If you're so worried about AI risk, why don't you just turn off the AI when you think it's about to do something dangerous?

On Friday, Members of the OpenAI board including Ilya Sutskever decided that they wanted to "turn off" OpenAI's rapid push towards smarter-than-human AI by firing CEO Sam Altman.

The result seems to be that the AI won. The board has backed down after Altman rallied staff into a mass exodus. There's an implied promise of riches from the AI to those who develop it more quickly, and people care a lot about money and not much about small changes in x-risk. Of course this is a single example, but it is part of a pattern of people wanting to reap localized rewards from AI - recently the UK said it will refrain from regulating AI 'in the short term', EU countries started lobbying to have foundation models excluded from regulation.

That is why you cannot just turn it off. People won't want to turn it off[1].

  1. There is a potential counterargument that once it becomes clear that AI is very dangerous, people will want to switch it off. But there is a conflicting constraint that it must also be possible to switch if off at that time. At early times, people may not take the threat seriously, and at late times they may take it seriously but not be able to switch it off because the AI is too powerful. ↩︎

New Comment
25 comments, sorted by Click to highlight new comments since: Today at 11:11 AM

The board has backed down after Altman rallied staff into a mass exodus

[citation needed]

I've seen rumors and speculations, but if you're that confident, I hope you have some sources?


(for the record, I don't really buy the rest of the argument either on several levels, but this part stood out to me the most)

Well the board are in negotiations to have him back


"A source close to Altman says the board had agreed in principle to resign and to allow Altman and Brockman to return, but has since waffled — missing a key 5PM PT deadline by which many OpenAI staffers were set to resign. If Altman decides to leave and start a new company, those staffers would assuredly go with him."

"A source close to Altman" means "Altman" and I'm pretty sure that he is not very trustworthy party at the moment.

Well the new CEO is blowing kisses to him on Twitter


My comment is a bit tangent and I'll be the first to admit I'm far from well informed in the area my question touches on. Maybe should have put this as it's own comment but honestly was not going to voice the thought until I read quetzal_rainbow's comment (and noted the karma and agreement).

A while back the EA community seemed to be really shocked by the FTX and Bankman-Fried fiasco (my word clearly). In the news story I saw related to the OpenAI situtation seems to be closely related to EA.

With two pretty big events fairly close temporally to one another should one update a bit regarding just how effective one might expect an EA approach to work, or perhaps at what scale it can work? Of should both be somewhat viewed as just one off events that really don't touch the core?

I think these are, judging from available info, kinda two opposite stories? The problem of SBF was that nobody inside EA was in position to tell him "you are an asshole who steals clients money, you are fired".

More general, any attempts to do something more effective will blow up a lot of things because trying to something more effective than business-as-usual is an outside-distribution problem and you can't simply choose to not go outside.

Is OpenAI considered part of EA or an "EA approach"? My answer to this would be no. There's been some debate on whether OpenAI is net positive or net negative overall, but that's a much lower bar than being a maximally effective intervention. I've never seen any EA advocate donating to OpenAI.

I know it was started by Musk with the attempt to do good, but even that wasn't really EA-motivated, at least not as far as I know.

Open Philanthropy did donate $30M to OpenAI in 2017, and got in return the board seat that Helen Toner occupied until very recently. However, that was when OpenAI was a non-profit, and was done in order to gain some amount of oversight and control over OpenAI. I very much doubt any EA has donated to OpenAI unconditionally, or at all since then.

Would you leak that statement to the press if the board definitely wasn't planning these things, and you knew they weren't? I don't see how it helps you. Can you explain?

I don't have a strong opinion about Altman's trustworthiness, but I can assume he just isn't trustworthy and I still don't get doing this.

"The board definitely isn't planning this" is not the same as "the board have zero probability of doing this". It can be "the board would do this if you apply enough psychological pressure through media".

Huh, whaddayaknow, turns out Altman was in the end pushed back, the new interim CEO is someone who is pretty safety-focused, and you were entirely wrong.


Normalize waiting for more details before dropping confident hot takes.

I should note that while your attitude is understandable, event "Roko said his confident predictions out loud" is actually good, because we can evaluate his overconfidence and update our models accordingly.

Well, Altman is back in charge now.... I don't think I'm being overconfident

"Estimate overconfidence" implies that estimate can be zero!

True. I may in fact have been somewhat underconfident here.

You're not taking your own advice. Since your message, Ilya has publicly backed down, and Polymarket has Sam coming back as CEO at coinflip odds: Polymarket | Sam back as CEO of OpenAI?

It seems that I was mostly right in the specifics, there was a lot of resistance to getting rid of Altman and he is back (for now)

I agree with a small update in this direction.

The board has backed down after Altman rallied staff into a mass exodus.

How would that be bad if you were trying to shut it down? [On edit: how would the exodus be bad, not how would backing down be bad]

Especially because the people most likely to quit would be the ones driving the risky behavior?

The big problem would seem to be that they might (probably would/will) go off and recreate the danger elsewhere, but that's probably not avoidable anyway. If you don't act, they'll continue to do it under your roof. If you force them to go set up elsewhere, then at least you've slowed them down a bit.

And you might even be able to use the optics of the whole mess to improve the "you can do whatever you want as long as you're big enough" regulatory framework that seems to have been being put into place, partly under OpenAI's own influence. Probably not, but at least you can cause policymakers to perceive chaos and dissent, and perhaps think twice about whether it's a good idea to give the chaotic organizations a lot of rope.

Granted this all rests on unsubstantiated rumors and hypotheticals, but in a scenario in which the board said "shut it down this is too risky", doesn't the response suggest we're doomed either way? Either

a) Investors have more say than the board and want money, so board resigns and SA is reinstated to pursue premiere AGI status

b) Board holds firm in decision to oust SA, but all his employees follow him to a new venture and investors follow suit and they're up and running with no more meaningful checks on their pursuit of godlike AI

After some recent (surprising) updates in favor of "oh maybe people are taking this more seriously than I expected and maybe there's hope", this ordeal leads me to update in the opposite direction of "we're in full speed ahead arms race to AGI and the only thing to stop it will be strong global government interventionist policy that is extremely unlikely". Not that the latter wasn't heavily weighted already, but this feels like the nail in the coffin.

Alternative framing: The board went after Altman with no public evidence of any wrongdoing. This appears to have backfired. If they had proof of significant malfeasance, and presented it to their employees, the story may have gone a lot differently. 

Applying this to the AGI analogy would be be a statement that you can't shut down an AGI without proof that it is faulty or malevolent in some way. I don't fully agree though: I think if a similar AGI design had previously done a mass murder, people would be more willing to hit the off switch early. 

In my view, this is an example of low-resolution thinking that occurs prior to us having direct experience in the actual mechanisms that would lead to actual danger. 

It's easy to imagine bad outcomes and thus activate associated emotions. Then we wish to (urgently) control those emotions through problem-solving.

However, our knowledge of the mechanism leading to the outcome is necessarily low resolution prior to the mechanism actually existing. So, we can only reason in terms of "switching it off", and other highly abstract concepts.

To address the argument: I don't think this is a case of the "AI winning" or an example of the "switch-off-ability" of AI. I think it just feels like it is.

As AI develops, I doubt this will ever be the useful question. I think there will be more boring questions that are hard to predict. In other words, we'll keep asking this question until an actual mechanism of danger exists, and we'll fairly easily know the right question at that point, and will solve it. 

I notice in myself the instinct to "play it safe", but I think this is downstream of a "fear of the unknown" emotional reflex.

With regard to the bigger question of ai safety, I think my argument here pushes me towards a lower risk assessment than I might have had before elucidating it. This is tangential though.

As a side note: perhaps this is why Sam has commented he thinks the most sensible approach to ai safety is to keep releasing stuff incrementally. This builds collective knowledge of actual mechanisms and thus progressively improves the debate into more and more useful territory. 

You can't just make up a series of facts and then claim that you've shown something not made up

I didn't make anything up. Altman is now back in charge BTW.

There's a whole part of the argument which is missing which is the framing of this as being about AI risk.
I've seen various propositions for why this happened, and the board being worried about AI risk is one of them but not the most plausible afaict.

In addition this is phrased similarly to technical problems like the corrigibility, which it is very much not about.
People who say "why can't you just turn it off" typically refer to literally turning off the AI if it appears to be dangerous, which this is not about. This is about turning off the AI company, not the AI.