Disclaimer: I know approximately nothing about internal governance at AGI labs. My current understanding is that the main AGI labs have basically no plans for how to navigate a conditional pause, and this post is predicated on this assumption.

TL;DR: If an AGI lab pauses, it’s essential that their employees don’t defect and advance capabilities in other ways, e.g. by breaking the pause commitment within the company, or by leaving the company and accelerating other companies’ capabilities. If entering a pause would make many employees angry, this sets up strong incentives for labs to never pause. To fix this, I recommend that companies take measures to keep their capabilities-focused employees happy during a conditional pause, such as giving them plenty of early warning, other projects to work on, or multiple years of paid time off.

It’s currently against employee incentives to enter a conditional pause

If a major AGI lab entered a voluntary conditional pause tomorrow, this would probably be a pretty stressful event for the employees that were working on training frontier models. It would probably also cause a significant shift in company priorities, such that many high-ranking people would have to be switched from their current roles into other roles. There would probably be people that are left without much to do at all, and potentially fired from the company. I expect that entering a conditional pause would cause a significant fraction of a company’s employees to have a stressful few weeks/months.

If such a restructuring were to happen overnight, I wouldn’t be surprised if there were an employee backlash of similar magnitude to the one that happened when Sam Altman was fired. The employees of OpenAI, and partly by extension the employees of other AGI labs, know that they can fight for what they want and threaten to join competitor labs en masse if they disagree with leadership decisions.

AGI labs have strong incentives to avoid a conditional pause

AGI labs want to avoid their employees leaving. In the event of a conditional pause, the most affected employees would be the ones that work on their frontier model capabilities. There are a few reasons to avoid the loss of these employees in particular:

  • The employees that are pushing capabilities are often those with the highest status (compared to safety or product) and relatively long track records (as product employees are mostly recent hires). Through status and long-standing connections, they will have substantial leverage to mobilize the rest of the workforce to further their interests.
  • These people also have significant insider information and special skills that, if they moved to other AGI labs (which are not paused), would probably significantly speed up their AGI efforts and lead to their original lab being at a disadvantage.

If an AGI lab expected a large fraction of their top capabilities talent to leave their lab if they paused, this would:

  1. strongly incentivize AGI labs not to pause, and
  2. make AGI labs who pause systematically lose employees to labs that don't.

There are significant negative externalities to capabilities-focused employees joining other labs:

  • We ideally want to limit the number of AGI labs that even enter the dangerous regions of AI development that trigger a conditional pause period, as there is a risk for any company to fail to trigger the conditional pause, and enter increasingly dangerous regions of AI development.
  • This means that, if an AGI lab enters a pause, their employees would ideally stay at their original lab, as going to other labs would increase their capabilities into more dangerous regions.

There are ways to improve employee incentives for a conditional pause

When an AGI lab starts to take drastic measures to reduce catastrophic risks, it's important that the employees are willing to make the necessary sacrifices, and that the decision makers can appropriately raise morale and support when making drastic decisions

There are many ways to improve employee incentives:

  1. Give employees notice – possibly maintain internal prediction markets for the probability of a conditional pause in some time period, and put out an internal notice if a conditional pause is likely within the next year.
  2. Have a continuously updated restructuring plan such that fewer people are left without a role if a pause happens.
  3. Offer paid time off during the period of the pause for unhappy employees.
  4. Create a safety-focused cultue – emphasize how virtuous it is to make small personal sacrifices in case the company needs to do large restructuring.
  5. Bigger picture – establish pause commitments between companies to avoid races to the bottom.
New Comment
12 comments, sorted by Click to highlight new comments since:

There are many ways to improve employee incentives:

One more extremely major one: ensure that you pay employees primarily in money that will retain its value if the company stops capabilities work, instead of trying to save money by paying employees partly in ownership of future profits (which will be vastly decreased if the company stops capabilities work).

Agreed. AGI labs should probably look into buying back their shares from employees to fix this retroactively.

I strongly agree with the high-level point that conditional pauses are unlikely to go well without planning for what employees will do during the pause. 

 A nitpick: while (afaik) Anthropic has made no public statements about their plans, their RSP does include a commitment to:

Proactively plan for a pause in scaling. We will manage our plans and finances to support a pause in model training if one proves necessary, or an extended delay between training and deployment of more advanced models if that proves necessary. During such a pause, we would work to implement security or other measures required to support safe training and deployment, while also ensuring our partners have continued access to their present tier of models (which will have previously passed safety evaluations).

While we would obviously prefer to meet all our ASL-3 commitments before we ever train an ASL-3 model, we have of course also thought about and discussed what we'd do in the other case. I expect that it would be stressful time, but for reasons upstream of the conditional pause!

I'm suspect that when things have come to enough of a head that AI companies are thinking of pulling the internal alarm bell and doing a hard switch to safety, that there's going to be a significant amount of interest from the government.

I anticipate a high likelihood that sufficiently powerful AI will represent a seriously destabilizing force on international power and relations. I think that this will probably lead to the governments whose countries the labs are based in deciding to nationalize all the labs and declare it illegal for AI researchers to emigrate or work for anyone but the government. I also suspect that the government will not agree to pause, even if the AI companies would have if left to their own devices. I believe the unavoidable military impact will pressure the government into going full-speed-ahead on military applications of AI. If there are some governments that do this, but some that don't, I expect that the world will quickly become militarily dominated by the governments that do. This could potentially be one government, like the US before anyone else had nuclear weapons. 

Basically, I anticipate that sufficiently powerful AI will unlock technology able to guarantee an enemy country will not be able to launch any land or sea based missiles or air attacks. That'd be a big deal. If the US government found out that OpenAI had discovered such a technology, how do you think the US military would react? They certainly wouldn't want to chance the tech leaking to Russia or China and thus unempowering the USA & allies.

I agree with the broader claim that as AGI approaches, governments are likely to intervene drastically to deal with national security threats.

However, I'm not so sure about the "therefore a global arms race will start" claim. I think it's pretty plausible that if the US or UK are the first to approach AGI, that they would come to their senses and institute a global pause instead of spearheading an arms race. Although maybe that's wishful thinking on my part.

I expect some people in the government to be like "wait, if a global arms race starts this is likely to end in catastrophe" and advocate for a pause instead. I think the US would be pretty happy with an enforcable pause if this meant it got to maintain a slight lead. I'd hope that (pause+slight lead) would be much more inticing than (race+large lead) given the catastrophic risk associated with the latter.

I agree actually. I think the three most likely branches of the future are:

a) a strong and well-enforced international treaty between all major powers which no country can opt out of, that successfully controls AI development and deployment.

b) a race-to-the-bottom of arms races, and first strikes, and increasing relative breakdowns in state monopoly on force as AI grants novel uncontrolled weapons tech to smaller groups (e.g. terrorist orgs).

c) (least likely, but still possible) a sudden jump in power of AI, which allows the controlling entity to seize power of the entire world in such a short time that state actors cannot react or stop them.

There are many ways to improve employee incentives

Yes. Actually, the best of those ways is to be ready to switch everyone to safety work in order to accomplish it as fast as possible and to be able to gradually end the pause sooner rather than later.

As you noted there will be other labs which are less safety-conscious compared to the labs ready to pause, and the last thing one wants is to see those other labs getting in the lead.

But for that to be possible the lab needs to start planning in advance, anticipating and planning a period of all-out safety work. It is a non-trivial task to be organizationally ready for something like that.

I think that in general, there aren't many examples of large portions of a large company suddenly switching what they're working on (on a timescale of days/weeks), and this seems pretty hard to pull off without very strong forces in play.

I guess some examples are how many companies had to shift their operations around a lot at the start of COVID, but this was very overdetermined, as the alternative was losing a lot of their profits. 

For AGI labs, if given a situation where they're uncertain if they should pause, it's less clear that they could rally large parts of their workforce to suddenly work on safety. I think planning for this scenario seems very good, including possibly having every employee not just have their normal role but also a "pause role", that is, a research project/team that they expect to join in case of a pause.

However, detailed planning for a pause is probably pretty hard, as the types of work you want to shift people to probably changes depending on what caused the pause.

I think that in general, there aren't many examples of large portions of a large company suddenly switching what they're working on (on a timescale of days/weeks), and this seems pretty hard to pull off without very strong forces in play.

I think this is actually easier and more common than you'd expect for software companies (see "reorgs" in the non-layoff sense). People get moved between wildly different teams all the time, and sometimes large portions of the company do this at once. It's unusual to reorg your biggest teams, but I think it could be done.

Note that you don't even need to change what everyone is working on since support work won't change nearly as much (if you have a sane separation of concerns). The people keeping servers running largely don't care what the servers are doing, the people collecting and cleaning data largely don't care what the data is being used for, etc.

The biggest change would be for your relatively small number of researchers, but even then I'd expect them to be able to come up with projects related to safety (if they're good, they probably already have ideas and just weren't prioritizing them).

Yes, in reality, as people are approaching the "danger zone", the safety work should gradually take more and more significant fraction.

I am impressed by OpenAI's commitment of 20% of meaningful resources to AGI/ASI safety, and by the recent impressive diversification of their safety efforts (which is great, since we really need to cover all promising directions), but I expect that closer to the real danger zone, 50% of resources or even more allocated to safety would not be unreasonable (of course, separation between safety and capability is tricky, a lot of safety enhancers do boost capabilities or do have strong potential to boost capabilities, so those percentages of allocations might become more nuanced; while those allocations would need to be against "pure capability" roles, a lot of mixed roles are likely to be involved, people collaborating with advanced AI systems on how to make it safe for everyone is an example of a quintessentially mixed role).

So, what I really think is that if there is a robust ongoing safety effort in a lab which is already measured in dozens of percents of the overall force and resources, it should be relatively easy to gradually absorb the rest of the force and resources into that safety effort, if needed.

Whereas, if the safety effort is tiny and next to non-existent, then this is unlikely to work (and in such a case I would not expect a lab to be safety-conscious enough to pause anyway).


There's lots of of other failure mechanisms here:

  1.  AI labs that pause are throwing away their lead.  This makes it a risky and ultimately defeatist move to spend their stashed billions paying employees not to further capabilities.  Presumably each further advance in AI will cost more than the prior one, this is consistent with other industries and technologies.
  2. The whole bottleneck here is that over the past 5 years, since the discovery of the transformer, a tiny number of human beings have had the opportunity to use large GPU clusters to experiment with AI.  The reason there is finite "talent" is less than a few thousand people worldwide have ever had the access at all.  This means that AI labs that pause and pay their talent not to work on capabilities are taking that finite talent off the market, but that will only be a bottleneck until tens of thousands of new people enter the field and learn it, which is what the market wants to happen.
  3. RSI is a mechanism to substitute compute for talent, and ultimately to no longer need human talent at all.  This could allow less ethical labs who don't pause a way to get around the fact that they only have 'villains' on their payroll.
  4. National labs, national efforts.  During the cold war, tens of thousands of human beings worked eagerly to develop fusion boosted nukes and load them on ICBMs.  Despite the obvious existential (at least to their entire nation) risk they all were aware they were contributing to.  If history repeats, this is another mechanism for AI to be built by unethical parties.  

There are so many ways for it to fail that success - actually pausing - is unlikely.  In the private world of just AI labs, without multiple major governments passing laws to restrict access to AI, it's pretty obvious that a pause is probably not possible at all.