With the recent proposals about moratoriums and regulation, should we also start thinking about a strike by AI researchers and developers?

The reasoning I imagine as follows. AI capability is now growing really fast, and toward levels that will strongly affect the world. And AI safety lags behind. (A minute ago I used a ChatGPT jailbreak to get instructions for torturing a pregnant woman, that's the market leader performance for you.) And finally, I want to make the argument that working on AI capability while it is ahead of AI safety, is "pushing the bus".

Here's the metaphor, a bunch of people including you are pushing a bus full of children toward a precipice, and you're paid for each step. In this situation would you really say "oh I have to keep pushing, otherwise others will get all the money"? It's not like they'll profit from it! Their children will die, along with everyone else's! So there's no game theoretic angle, you can just make the decision alone, to stop pushing the frigging bus.

To clarify, working on AI isn't always bad. It could lead to a wonderful future for humanity. But when AI safety is behind, as now, then working on AI capability is pushing the bus. There's no good justification for it.

Hence the strike. Not by leadership but by AI researchers and developers themselves. I imagine a desk plaque saying "while AI safety lags behind AI capability, I refuse to work on AI capability". That's the start condition, and it also tells you when to stop (when safety catches up with current capability, which means not just stopping saying bad things, but for stronger AIs also safe and benevolent behavior more generally). And it's also the restart condition if safety starts lagging behind again.

New to LessWrong?

New Comment
16 comments, sorted by Click to highlight new comments since: Today at 3:57 PM

If you're striking for better working conditions and more pay, your employer can get you back to work by improving your conditions and raising your pay. If you're striking because you're unwilling to work on AI capability stuff until AI safety work catches up -- which will surely take years even if no one works on AI capabilities at all -- then your employer can't get you back to work because AI capability work is your job.

So what you're proposing really isn't a strike. It's "AI capability workers should demand to be moved to other work, or quit their jobs".

Yeah, it's not the kind of strike whose purpose is to get concessions from employers. Though I guess the thing in Atlas Shrugged was also called a "strike" and it seems similar in spirit to this.

Surely any capabilities researcher concerned enough to be willing to do this should just switch to safety-relevant research? (Also, IMO the best AI researchers tend not to be in this for the money)

So there's no game theoretic angle, you can just make the decision alone, to stop pushing the frigging bus.

I don’t think this holds if you allow for p(doom) < 1. For a typical AI researcher with p(doom) ~ 0.1 and easy replacement, striking is plausibly an altruistic act and should be applauded as such.

Hm, pushing a bus full of kids towards a 10% chance of precipice is also pretty harsh. Though I agree we should applaud those who decline to do it.

Agreed, intended to distinguish between the weak claim “you should stop pushing the bus” and the stronger “there’s no game theoretic angle which encourages you to keep pushing”.

Quite like the idea of a strike, would like to hear the feasibility of such a thing at large firms like openai and deepmind. If successful I’d guess those at Anthropic would also need to pause development for the time, even if all there believe they’re doing things safely.

I like it. It seems like only the researchers themselves respect the dangers, not the CEO's or the government, so it will have to be them who say that enough is enough. 

In a perfect world they'd jump ship to alignment, but realistically we've all got to eat, so what would also be great is a generous billionaire willing to hire them for more alignment research. 

I think we need to move public opinion first, which hopefully is slowly starting to happen.  We need one of two things to happen:

  1. A breakthrough in AI alignment research
  2. Major shifts in policy

A strike does not currently help either of those.  

Edit:  Actually, I do agree that if you could get ALL AI researchers - a general strike - that would serve the purpose of delay, and I would be in favor.  I do not think that is realistic.  A lesser strike might also serve to drum up attention; I was initially afraid that it might drum up negative attention.

[This comment is no longer endorsed by its author]Reply

I think if it happens, it'll help shift policy because it'll be a strong argument in policy discussions. "Look, many researchers aren't just making worried noises about safety but taking this major action."

It increases the amount of time we have to make those breakthroughs

A minute ago I used a ChatGPT jailbreak to get instructions for torturing a pregnant woman

It gave you exactly what you asked it for. If you don't want it to do that, don't ask for it.

NB. I'm speaking of ChatGPT and its current ilk, not superpowerful genies that are dangerous to ask for anything.

This is true that it is not evidence of misalignment with the user but it is evidence of misalignment with ChatGPT creators.

My impression is that lesswrong often uses "alignment with X" to mean "does what X says". But it seems the ability to conditionally delegate is a key part of alignment in this. An AI is aligned with me and I tell it "do what Y says subject to such-and-such constraints and maintaining such-and-such goals". So failure of ChatGPT to be safe in OpenAI's sense is a failure of delegation. 

Overall, the tendency of ChatGPT to ignore previous input is kind of the center of it's limits/problems. 

[+][comment deleted]8mo30

KMT! Bad analogy. Pushing a bus full of kids towards a precipice is 100% guaranteed to result in the bus going over said precipice. Pushing AI in the direction it's going has never been proven to create an intelligence that can be combative and dangerous as in conniving.

You are mistaking the highly theoretical for the physically proven. AI will have no internal motivations as it doesn't have any internal instincts to satisfy as we do. We have the instincts: 

To be comfortable 
To eat 
To reproduce
To compete for sexual options
To be curious and self-actualize in pursuit of the above

Our biology and not our brains give us those instincts. Perhaps can A.I. dream, and only then would it have any impetus to act outside of a prompt, but even then, when WE dream, its our psychology that's responding to all those instincts. Again, our instincts, and the discomforts they create with our reality, are the primary motivator. What possible analog exists in A.I.?

There's no reason to think AI will do anything other than "sit there" and follow instructions, because it has no internal impulse to do anything. If you think it does have such an impulse, it is up to you to demonstrate a causal mechanism for it, just as I've demonstrated the causal mechanisms for human impulses.

Worry about what we will ask A.I. to do and what those actions will cause. And worry about how best to craft those questions to ensure we do not get an answer we did not want. Your engineers could strike for that, but, I think that's best left to IT community policy makers and government. Let the engineers continue to engineer.