Somewhat tangential but when you list the Safety people who have departed, I'd have prefered to see some sort of comparison group or base rate, as it always raises a red flag for me when only the absolute numbers are provided.
I did a quick check by changing your prompt from 'AGI Safety or AGI Alignment' to 'AGI Capabilities or AGI Advancement' and got 60% departed (compared to 70% for AGI Safety by you) with 4o. I do think what we are seeing is alarming but it's too easy for either 'side' to accidentally exagerate via framing if you don't watch for that sort of thing.
If OpenAI and Sam Altman want to fix this situation, it is clear what must be done as the first step. The release of claims must be replaced, including retroactively, by a standard release of claims. Daniel’s vested equity must be returned to him, in exchange for that standard release of claims. All employees of OpenAI, both current employees and past employees, must be given unconditional release from their non-disparagement agreements, all NDAs modified to at least allow acknowledging the NDAs, and all must be promised in writing the unconditional ability to participate as sellers in all future tender offers.
Then the hard work can begin to rebuild trust and culture, and to get the work on track.
Alright - suppose they don't. What then?
I don't think I misstep in positing that we (for however you want to construe "we") should model OAI as - jointly but independently - meriting zero trust and functioning primarily to make Sam Altman personally more powerful. I'm also pretty sure that asking Sam to pretty please be nice and do the right thing is... perhaps strategically counterindicated.
Suppose you, Zvi (or anyone else reading this! yes, you!) were Unquestioned Czar of the Greater Ratsphere, with a good deal of money, compute, and soft power, but basically zero hard power. Sam Altman has rejected your ultimatum to Do The Right Thing and cancel the nondisparagements, modify the NDAs, not try to sneakily fuck over ex-employees when they go to sell and are made to sell for a dollar per PPU, etc, etc.
What's the line?
Go all-in on lobbying the US and other governments to fully prohibit the training of frontier models beyond a certain level, in a way that OpenAI can't route around (so probably block Altman's foreign chip factory initiative, for instance).
Some ideas:
I am limited in my means, but I would commit to a fund for strategy 2. My thoughts were on strategy 2, and it seems likely to do the most damage to OpenAI's reputation (and therefore funding) out of the above options. If someone is really protective of something, like their public image/reputation, that probably indicates that it is the most painful place to hit them.
I'd like to hear from people who thought that AI companies would act increasingly reasonable (from an x-safety perspective) as AGI got closer. Is there still a viable defense of that position (e.g., that SamA being in his position / doing what he's doing is just uniquely bad luck, not reflecting what is likely to be happening / will happen at other AI labs)?
Also, why is there so little discussion of x-safety culture at other AI labs? I asked on Twitter and did not get a single relevant response. Are other AI company employees also reluctant to speak out, if so that seems bad (every explanation I can think of seems bad, including default incentives + companies not proactively encouraging transparency).
Naia: It’s OK, everyone. Mr. Not Consistently Candid says the whole thing was an oopsie, and that he’ll fix things one-by-one for people if they contact him privately. Definitely nothing to worry about, then. Carry on.
I'm wondering, what would happen if you contact Altman privately right now? Would you be added to a list of bad kids? What is the typical level of shadiness of American VCs?
Would you be added to a list of bad kids?
That would seem to be the "nice" outcome here, yes.
What is the typical level of shadiness of American VCs?
If you're asking that question, I claim that you already suspect the answer and should stop fighting it.
From my point of view, of course profit maximizing companies will…maximize profit. It never was even imaginable that these kinds of entities could shoulder such a huge risk responsibly.
Correct me if I'm wrong but isn't Conjecture legally a company? Maybe their profit model isn't actually foundation models? Not actually trying to imply things, just thought the wording was weird in that context and was wondering whether Conjecture has a different legal structure than I thought.
It’s a funny comment because legally Conjecture is for-profit and OpenAI is not. It just goes to show that the various internal and external pressures and incentives on an organization and its staff are not encapsulated by glancing at their legal status—see also my comment here.
Anyway, I don’t think Connor is being disingenuous in this particular comment, because he has always been an outspoken advocate for government regulation of all AGI-related companies including his own.
I don’t think it’s crazy or disingenuous in general to say “This is a terrible system, so we’re gonna loudly criticize it and advocate to change it. But meanwhile / in parallel, we’re gonna work within the system we got, and do the best we can.” And “the system we got” is that private organizations are racing to develop AGI.
I think the marginal value of OpenAI competence is now negative. We are at a point where they have basically no chance to succeed at alignment, and further incompetence makes it more likely for the company to not get anything dangerous. Making any AGI at all requires competence and talent, and an environment that isn't a political cesspool.
Greg Brockman and Sam Altman (cosigned):
[...]
First, we have raised awareness of the risks and opportunities of AGI so that the world can better prepare for it. We’ve repeatedly demonstrated the incredible possibilities from scaling up deep learning
chokes on coffee
This also stood out to me as a truly insane quote. He's almost but not quite saying "we have raised awareness that this bad thing can happen by doing the bad thing"
This has been OpenAI's line (whether implicitly or explicitly) for a while iiuc. I referenced it on my Open Asteroid Impact website, under "Operation Death Star."
A wise man does not cut the ‘get the AI to do what you want it to do’ department when it is working on AIs it will soon have trouble controlling. When I put myself in ‘amoral investor’ mode, I notice this is not great, a concern that most of the actual amoral investors have not noticed.
My actual expectation is that for raising capital and doing business generally this makes very little difference. There are effects in both directions, but there was overwhelming demand for OpenAI equity already, and there will be so long as their technology continues to impress.
No one ever got fired buying IBM OpenAI. ML is flashy and investors seem to care less about gears-level understanding of why something is potentially profitable than whether they can justify it. It seems to work out well enough for them.
What about employee relations and ability to hire? Would you want to work for a company that is known to have done this? I know that I would not. What else might they be doing? What is the company culture like?
Here's a sad story of a plausible possible present: OAI fires a lot of people who care more-than-average about AI safety/NKE/x-risk. They (maybe unrelatedly) also have a terrible internal culture such that anyone who can leave, does. People changing careers to AI/ML work are likely leaving careers that were even worse, for one reason or another - getting mistreated as postdocs or adjuncts in academia has gotta be one example, and I can't speak to it but it seems like repeated immediate moral injury in defense or finance might be another. So... those people do not, actually, care, or at least they can be modelled as not caring because anyone who does care doesn't make it through interviews.
What else might they be doing? Can't be worse than callously making the guidance systems for the bombs for blowing up schools or hospitals or apartment blocks. How bad is the culture? Can't possibly be worse than getting told to move cross-country for a one-year position and then getting talked down to and ignored by the department when you get there.
It pays well if you have the skills, and it looks stable so long as you don't step out of line. I think their hiring managers are going to be doing brisk business.
minus Cullen O’Keefe who worked on policy and legal (so was not a clear cut case of working on safety),
I think Cullen was on the same team as Daniel (might be misremembering), so if you count Daniel, I'd also count Cullen. (Unless you wanna count Daniel because he previously was more directly part of technical AI safety research at OAI.)
I'm sorry if this is a stupid question.
How can NDA actually work effectively? What if Alice, ex-employee of OpenAI write a "totally fictional short story about Bob, ex-employee of ClosedDM, who wants to tell about horrible things this company did"?
In general, courts are not so stupid and the law is not so inflexible to ignore such an obvious fig leaf, if the NDA was otherwise enforceable. Query whether it is, but whether or not you just make your statement openly or whether you have a totally-fictional statement about totally-not-OpenAI would be unlikely to make a difference IMO.
*I don't represent you and this statement should not be taken as legal advice on any particular concrete scenario.
Thanks! BTW, my curiosity doesn't stop: do you (americans? west-europeans too?) actually feel the necessity to write this disclaimers about "not a legal/financial advice"? It's like "my granpa said he remembers one time someone got sued because they didn't write it" or more like "fasten your seatbelt"?
It's something that kinda falls out of Attorney ethics rules, where a lot of duties attach to representation of a client. So we want to be very clear when we are and are not representing someone. In addition, under state ethics laws (I'm a state government lawyer), we are not authorized to provide legal advice to private parties.
Oh, so (almost) everyone who write this do it because they have some profession such that they sometimes really give serious legal/financial/medical advise, right? This makes perfect sense, I think I just didn't realise how often the people I read in the Internet are like this, so I didn't have this as a hypothesis in my head :)
Yes, if the departing people thought OpenAI was plausibly about to destroy humanity in the near future due to a specific development, they would presumably break the NDAs, unless they thought it would not do any good. So we can update on that.
Thanks for pointing that out -- it hadn't occurred to me that there's a silver lining here in terms of making the shortest timelines seem less likely.
On another note, I think it's important to recognize that even if all ex-employees are released from the non-disparagement clauses and the threat of equity clawback, they still have very strong financial incentives against saying negative things about the company. We know that most of them are moved by that, because that was the threat that got them to sign the exit docs.
I'm not really faulting them for that! Financial security for yourself and your family is an extremely hard thing to turn down. But we still need to see whatever statements ex-employees make with an awareness that for every person who speaks out, there might have been more if not for those incentives.
"we need to have the beginning of a hint of a design for a system smarter than a house cat"
You couldn't make a story about this, I swear.
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
Previously: OpenAI: Facts From a Weekend, OpenAI: The Battle of the Board, OpenAI: Leaks Confirm the Story, OpenAI: Altman Returns, OpenAI: The Board Expands.
Ilya Sutskever and Jan Leike have left OpenAI. This is almost exactly six months after Altman’s temporary firing and The Battle of the Board, the day after the release of GPT-4o, and soon after a number of other recent safety-related OpenAI departures. Many others working on safety have also left recently. This is part of a longstanding pattern at OpenAI.
Jan Leike later offered an explanation for his decision on Twitter. Leike asserts that OpenAI has lost the mission on safety and culturally been increasingly hostile to it. He says the superalignment team was starved for resources, with its public explicit compute commitments dishonored, and that safety has been neglected on a widespread basis, not only superalignment but also including addressing the safety needs of the GPT-5 generation of models.
Altman acknowledged there was much work to do on the safety front. Altman and Brockman then offered a longer response that seemed to say exactly nothing new.
Then we learned that OpenAI has systematically misled and then threatened its departing employees, forcing them to sign draconian lifetime non-disparagement agreements, which they are forbidden to reveal due to their NDA.
Altman has to some extent acknowledged this and promised to fix it once the allegations became well known, but so far there has been no fix implemented beyond an offer to contact him privately for relief.
These events all seem highly related.
Also these events seem quite bad.
What is going on?
This post walks through recent events and informed reactions to them.
The first ten sections address departures from OpenAI, especially Sutskever and Leike.
The next five sections address the NDAs and non-disparagement agreements.
Then at the end I offer my perspective, highlight another, and look to paths forward.
Table of Contents
The Two Departure Announcements
Here are the full announcements and top-level internal statements made on Twitter around the departures of Ilya Sutskever and Jan Leike.
[Ilya then shared the photo below]
Jan Leike later offered a full Twitter thread, which I analyze in detail later.
Who Else Has Left Recently?
If you asked me last week whose departures other than Sam Altman himself or a board member would update me most negatively about the likelihood OpenAI would responsibly handle the creation and deployment of AGI, I would definitely have said Ilya Sutskever and Jan Leike.
If you had asked me what piece of news about OpenAI’s employees would have updated me most positively, I would have said ‘Ilya Sutskever makes it clear he is fully back and is resuming his work in-office as head of the Superalignment team, and he has all the resources he needs and is making new hires.’
If Jan’s and Ilya’s departures were isolated, that would be bad enough. But they are part of a larger pattern.
Here is Shakeel’s list of safety researchers at OpenAI known to have left in the last six months, minus Cullen O’Keefe who worked on policy and legal (so was not a clear cut case of working on safety), plus the addition of Ryan Lowe.
Here’s some other discussion of recent non-safety OpenAIemployee departures.
Ilya Sutskever was one of the board members that attempted to fire Sam Altman.
Jan Leike worked closely with Ilya to essentially co-lead Superalignment. He has now offered an explanation thread.
William Saunders also worked on Superalignment; he resigned on February 15. He posted this on LessWrong, noting his resignation and some of what he had done at OpenAI, but no explanation. When asked why he quit, he said ‘no comment.’ The logical implications are explored.
Leopold Aschenbrenner and Pavel Izmailov were fired on April 11 for supposedly leaking confidential information. The nature of leaking confidential information is that people are reluctant to talk about exactly what was leaked, so it is possible that OpenAI’s hand was forced. From what claims we do know and what I have read, the breach seemed technical and harmless. OpenAI chose to fire them anyway. In Vox, Sigal Samuel is even more skeptical that this was anything but an excuse. Leopold Aschenbrenner was described as an ally of Ilya Sutskever.
Ryan Lowe ‘has a few projects in the oven’. He also Tweeted the following and as far as I can tell that’s all we seem to know.
Cullen O’Keefe left to be Director of Research at the Institute for Law & AI.
Daniel Kokotajlo quit on or before April 18 ‘due to losing confidence that [OpenAI] would behave responsibly around the time of AGI.’ He gave up his equity in OpenAI, constituting 85% of his family’s net worth, to avoid signing a non-disparagement agreement, but he is still under NDA.
We do not have a full enumeration of how many people would have counted for a list like this. Based on this interview with Jan Leike (at about 2:16:30) six months ago superalignment was about a 20 person team, and safety outside of it was broad but mostly RLHF and other mundane safety efforts with easy business cases that don’t clash with the company culture.
Then we lost 7 within 6 months, concentrated on senior leadership. This seems like rather a lot.
Then we can add, within weeks, the head of nonprofit and strategic initiatives, the head of social impact and a vice president of people. That sounds a lot like this goes well beyond potential future safety issues, and goes deep into problems such as general ethical behavior and responsible strategic planning.
Who Else Has Left Overall?
OpenAI has a longstanding habit of losing its top safety-oriented people.
As we all know, OpenAI is nothing without its people.
I asked GPT-4o, Claude Opus and Gemini Advanced to rank the current and former employees of OpenAI by how important they are in terms of AGI safety efforts:
[EDIT: A previous version incorrectly thought Miles Brundage had left. My apologies.]
On average, over 70% of the named people have now departed, including 100% of the top 5 from all lists. This is in addition to what happened to the board including Helen Toner.
Those that remain are CEO Sam Altman, co-founder John Schulman, Alec Radford, Miles Brundage and Jeff Wu. What do all of them appear to have in common? They do not have obvious ‘safety branding,’ and their primary work appears to focus on other issues. John Schulman does have a co-authored alignment forum post.
Once is a coincidence. Twice is suspicious. Over 70% of the time is enemy action.
Early Reactions to the Departures
Here are various early reactions to the news, before the second wave of information on Friday from Vox, Bloomberg, Leike, and others.
Version of that comment on Friday morning, still going:
For symmetry, here’s the opposite situation:
Gary Marcus summarized, suggests ‘his friends in Washington should look into this.’
The Obvious Explanation: Altman
We know that a lot of OpenAI’s safety researchers, including its top safety researchers, keep leaving. We know that has accelerated in the wake of the attempted firing of Sam Altman.
That does not seem great. Why is it all happening?
At Vox, Sigal Samuel offers a simple explanation. It’s Altman.
Jan Leike Speaks
I want to deeply thank Jan Leike for his explanation of why he resigned.
Here is Jan Leike’s statement, in its entirety:
This paints a very clear picture, although with conspicuous absence of any reference to Altman. The culture of OpenAI had indeed become toxic, and unwilling to take safety seriously.
This is a deeply polite version of ‘We’re f***ed.’
Leike’s team was starved for compute, despite the commitments made earlier.
OpenAI was, in his view, severely underinvesting in both Superalignment and also more mundane forms of safety.
Safety culture took a backseat to shiny new products (presumably GPT-4o was one of these).
According to Bloomberg, Ilya’s departure was Jan’s last straw.
TechCrunch confirms that OpenAI failed to honor its compute commitments.
Now the Superalignment team has been dissolved.
I presume that OpenAI would not be so brazen as to go after Jan Leike or confiscate his equity in light of this very respectful and restrained statement, especially in light of other recent statements in that area.
It would be very bad news if this turns out to not be true. Again, note that the threat is stronger than its execution.
Reactions after Leike’s Statement
Roon is in some ways strategically free and reckless. In other ways, and in times like this he chooses his Exact Words very carefully.
Others were less Straussian.
Greg Brockman and Sam Altman Respond to Leike
Altman initially responded with about the most graceful thing he could have said (in a QT). This is The Way provided you follow through.
A few days to process all this and prepare a response is a highly reasonable request.
So what did they come back with?
Here is the full statement.
My initial response was: “I do not see how this contains new information or addresses the concerns that were raised?”
Others went further, and noticed this said very little.
This did indeed feel like that part of Isaac Asimov’s Foundation, where a diplomat visits and everyone thinks he is a buffoon, then after he leaves they use symbolic logic to analyze his statements and realize he managed to say exactly nothing.
So I had a fun conversation where I asked GPT-4o, what in this statement was not known as of your cutoff date? It started off this way:
Then I shared additional previously known information, had it browse the web to look at the announcements around GPT-4o, and asked, for each item it named, whether there was new information.
Everything cancelled out. Link has the full conversation.
And then finally:
Reactions from Some Folks Unworried About Highly Capable AI
Note the distinction between Colin’s story here, that OpenAI lacks the resources to do basic research, and his previous claim that a culture clash makes it effectively impossible for OpenAI to do such research. Those stories suggest different problems with different solutions.
‘OpenAI does not have sufficient resources’ seems implausible given their ability to raise capital, and Leike says they’re severely underinvesting in safety even on business grounds over a two year time horizon. A culture clash or political fight fits the facts much better.
So there are outsiders who want work done on safety and many of them think endangering humanity would have been a good justification, if true, for firing the CEO? And that makes it good to purge everyone working on safety? Got it.
Good questions, even if like Timothy you are skeptical of the risks.
Don’t Worry, Be Happy?
How bad can it be if they’re not willing to violate the NDAs, asks Mason.
This follows in the tradition of people saying versions of:
Classic versions of this include ‘are you short the market?’ ‘why are you not borrowing at terrible interest rates?’ and ‘why haven’t you started doing terrorism?’
Here is an example from this Saturday. This is an ongoing phenomenon. In case you need a response, here are On AI and Interest Rates (which also covers the classic ‘the market is not predicting it so it isn’t real’) and AI: Practical Advice for the Worried. I still endorse most of that advice, although I mostly no longer think ‘funding or working on any AI thing at all’ is still a major vector for AI acceleration, as long as something is unrelated to core capabilities.
Other favorites include all variations of both ‘why are you taking any health risks [or other consequences]’ and ‘why are you paying attention to your long term health [or other consequences].’
Maybe half the explanation is embodied in this very good two sentences:
Then the next day Jan Leike got a lot less cryptic, as detailed above.
Then we found out it goes beyond the usual NDAs.
The Non-Disparagement and NDA Clauses
Why have we previously heard so little from ex-employees?
Short of forfeiting their equity, OpenAI employees are told they must sign extremely strong NDAs and non-disparagement agreements, of a type that sets off alarm bells. Then you see how they mislead and threaten employees to get them to sign.
If this was me, and I was a current or former OpenAI employee, I would absolutely, at minimum, consult a labor lawyer to review my options.
How are they doing it? Well, you see…
Clause four was the leverage. Have people agree to sign a ‘general release,’ then have it include a wide variety of highly aggressive clauses, under threat of loss of equity. Then, even if you sign it, OpenAI has complete discretion to deny you any ability to sell your shares.
Note clause five as well. This is a second highly unusual clause in which vested equity can be canceled. What constitutes ‘cause’? Note that this is another case where the threat is stronger than its execution.
One potential legal or ethical justification for this is that these are technically ‘profit participation units’ (PPUs) rather than equity. Perhaps one could say that this was a type of ‘partnership agreement’ for which different rules apply, if you stop being part of the team you get zeroed.
But notice Sam Altman has acknowledged, in the response we will get to below, that this is not the case. Not only does he claim no one has had their vested equity confiscated, he then admits that there were clauses in the contracts that refer to the confiscation of vested equity. That is an admission that he was, for practical purposes, thinking of this as equity.
Legality in Practice
So the answer to ‘how is this legal’ is ‘it probably isn’t, but how do you find out?’
Overly broad non-disparagement clauses (such as ‘in any way for the rest of your life’) can be deemed unenforceable in court, as unreasonable restraints on speech. Contracts for almost anything can be void if one party was not offered consideration, as is plausibly the case here. There are also whistleblower and public policy concerns. And the timing and context of the NDA and especially non-disparagement clause, where the employee did not know about them, and tying them to a vested equity grant based on an at best highly misleading contract clause, seems highly legally questionable to me, although of course I am not a lawyer and nothing here is legal advice.
Certainly it would seem bizarre to refuse to enforce non-compete clauses, as California does and the FTC wants to do, and then allow what OpenAI is doing here.
Implications and Reference Classes
A fun and enlightening exercise is to ask LLMs what they think of this situation, its legality and ethics and implications, and what companies are the closest parallels.
The following interaction was zero shot. As always, do not take LLM outputs overly seriously or reliably:
(For full transparency: Previous parts of conversation at this link, I quoted Kelsey Piper and then ask ‘If true, is what OpenAI doing legal? What could an employee do about it’ then ‘does it matter that the employee has this sprung upon them on departure?’ and then ‘same with the non-disparagement clause?’ in terms of whether I am putting my thumb on the scale.)
I pause here because this is a perfect Rule of Three, and because it then finishes with In-N-Out Burger and Apple, which it says use strict NDAs but are not known to use universal non-disparagement agreements.
Claude did miss at least one other example.
At a minimum, this practice forces us to assume the worst, short of situations so dire a threat to humanity that caution would in practice be thrown to the wind.
To be fair, the whole point of these setups is the public is not supposed to find out.
That makes it hard to know if the practice is widespread.
Exactly. The non-disparagement agreement that can be discussed is not the true fully problematic non-disparagement agreement.
Rob Bensinger offers a key distinction, which I will paraphrase for length:
It is wise and virtuous to have extraordinarily tight information security practices around IP when building AGI. If anything I would worry that no company is taking sufficient precautions. OpenAI being unusually strict here is actively a feature.
This is different. This is allowing people to say positive things but not negative things, forever, and putting a high priority on that. That is deception, that is being a bad actor, and it provides important context to the actions of the board during the recent dispute.
Also an important consideration:
Altman Responds on Non-Disparagement Clauses
So, About That Response
Three things:
As in:
And as in:
I do want to acknowledge that:
But, here, in this case, to this extent? C’mon. No.
I asked around. These levels of legal silencing tactics, in addition to being highly legally questionable, are rare and extreme, used only in the most cutthroat of industries and cases, and very much not the kind of thing lawyers sneak in unrequested unless you knew exactly which lawyers you were hiring.
Why has confiscation not happened before? Why hadn’t we heard about this until now?
Because until Daniel Kokotajlo everyone signed.
The harm is not the equity. The harm is that people are forced to sign and stay silent.
None of this is okay.
How Bad Is All This?
I think it is quite bad.
It is quite bad because of the larger pattern. Sutskever’s and Leike’s departures alone would be ominous but could be chalked up to personal fallout from the Battle of the Board, or Sutskever indeed having an exciting project and taking Leike with him.
I do not think we are mostly reacting to the cryptic messages, or to the deadening silences. What we are mostly reacting to is the costly signal of leaving OpenAI, and that this cost has now once again been paid by so many of its top safety people and a remarkably large percentage of all its safety employees.
We are then forced to update on the widespread existence of NDAs and non-disparagement agreements—we are forced to ask, what might people have said if they weren’t bound by NDAs or non-disparagement agreements?
The absence of evidence from employees speaking out, and except for those by Geoffrey Irving the lack of accusations of outright lying, no longer seem like strong evidence of absence. And indeed, we now have at least a number of (anonymous) examples of ex-employees saying they would have said concerning things, but aren’t doing so out of fear.
Yes, if the departing people thought OpenAI was plausibly about to destroy humanity in the near future due to a specific development, they would presumably break the NDAs, unless they thought it would not do any good. So we can update on that.
But that is not the baseline scenario we are worried about. We are worried that OpenAI is, in various ways and for various reasons, unlikely to responsibly handle the future creation and deployment of AGI or ASI. We are worried about a situation in which the timeline to the critical period is unclear even to insiders, so there is always a large cost to pulling costly alarms, especially in violation of contracts, including a very high personal cost. We are especially worried that Altman is creating a toxic working environment at OpenAI for those working on future existential safety, and using power plays to clean house.
We also have to worry what else is implied by OpenAI and Altman being willing to use such rare highly deceptive and cutthroat legal tactics and intimidation tactics, and how they handled the issues once brought to light.
At minimum, this shows a company with an extreme focus on publicity and reputation management, and that wants to silence all criticism. That already is anathema to the kind of openness and truth seeking we will need.
It also in turn suggests the obvious question of what they are so keen to hide.
We also know that the explicit commitment to the Superalignment team of 20% of current compute was not honored. This is a very bad sign.
If OpenAI and Sam Altman want to fix this situation, it is clear what must be done as the first step. The release of claims must be replaced, including retroactively, by a standard release of claims. Daniel’s vested equity must be returned to him, in exchange for that standard release of claims. All employees of OpenAI, both current employees and past employees, must be given unconditional release from their non-disparagement agreements, all NDAs modified to at least allow acknowledging the NDAs, and all must be promised in writing the unconditional ability to participate as sellers in all future tender offers.
Then the hard work can begin to rebuild trust and culture, and to get the work on track.
Those Who Are Against These Efforts to Prevent AI From Killing Everyone
Not everyone is unhappy about these departures.
There is a group of people who oppose the idea of this team, within a private company, attempting to figure out how we might all avoid a future AGI or ASI killing everyone, or us losing control over the future, or other potential bad outcomes. They oppose such attempts on principle.
To be clear, it’s comprehensible to believe that we should only engage in private preventative actions right now, either because (1) there is no worthwhile government action that can be undertaken at this time, or because (2) in practice, government action is likely to backfire.
I strongly disagree with that, but I understand the viewpoint.
It is also not insane to say people are overreacting to the new information.
This is something else. This is people saying: “It is good that a private company got rid of the people tasked with trying to figure out how to make a future highly capable AI do things we want it to do instead of things we do not want it to do.”
There is a reduction in voluntary private safety efforts. They cheer and gloat.
This ranges from insane but at least in favor of humanity…
…to those who continue to have a false idea of what happened when the board attempted to fire Altman, and think that safety is a single entity, so trying not to die is bad now…
…to those who (I think correctly) think OpenAI’s specific approach to safety wouldn’t work, and who are modeling the departures and dissolution of the Superalignment team as a reallocation to other long term safety efforts as opposed to a move against long term (and also short term) safety efforts in general…
…to those who dismiss all such concerns as ‘sci-fi’ as if that is an argument…
…to those who consider this a problem for Future Earth and think Claude Opus is dumber than a cat…
…to those who are in favor of something else entirely and want to incept even more of it.
What Will Happen Now?
Jakub Pachocki will replace Ilya as Chief Scientist.
The Superalignment team has been dissolved (also confirmed by Wired).
John Schulman will replace Jan Leike as head of AGI related safety efforts, but without a dedicated team. Remaining members have been dispersed across various research efforts.
We will watch to see how OpenAI chooses to handle their non-disparagement clauses.
What Else Might Happen or Needs to Happen Now?
One provision of the proposed bill SB 1047 is whistleblower protections. This incident illustrates why such protections are needed, whatever one thinks of the rest of the bill.
This also emphasizes why we need other transparency and insight into the actions of companies such as OpenAI and their safety efforts.
If you have information you want to share, with any level of confidentiality, you can reach out to me on Twitter or LessWrong or otherwise, or you can contact Kelsey Piper whose email is at the link, and is firstname.lastname@vox.com.
If you don’t have new information, but do have thoughtful things to say, speak up.
As a canary strategy, consider adding your like to this Twitter post to indicate that (like me) you are not subject to a non-disparagement clause or a self-hiding NDA.
Everyone needs to update their views and plans based on this new information. We need to update, and examine our past mistakes, including taking a hard look at the events that led to the founding of OpenAI. We should further update based on how they deal with the NDAs and non-disparagement agreements going forward.
The statements of anyone who worked at OpenAI at any point need to be evaluated on the assumption that they have signed a self-hiding NDA and a non-disparagement clause. Note that this includes Paul Christiano and Dario Amodei. There have been notes that Elon Musk has been unusually quiet, but if he has a non-disparagement clause he’s already violated it a lot.
Trust and confidence in OpenAI and in Sam Altman has been damaged, especially among safety advocates and the worried. Also across the board given the revelations about the non-disparagement provisions. The magnitude remains to be seen.
Will there be consequences to ability to raise capital? In which direction?
Most large investors do not care about ethics. They care about returns. Nor do they in practice care much about how likely a company is to kill everyone. Credibly signaling that you will not pay to produce badly needed public goods, and that you will be ruthless and do what it takes, that you are willing to at least skirt with the edges of the law and employ highly deceptive practices, and are orienting entirely around profits, near term results and perhaps building a business? By default these are all very good for the stock price and for talking to venture capital.
The flip side is that past a certain point such actions are highly indicative of a company and leader likely to blow themselves up in the not too distant future. Such tactics strongly suggest that there were things vital enough to hide that such tactics were deemed warranted. The value of the hidden information is, in expectation, highly negative. If there is a public or government backlash, or business partners stop trusting you, that is not good.
There is also the issue of whether you expect Altman to honor his deal with you, including if you are an employee. If you sign a business deal with certain other individuals we need not name, knowing what we know about them now, and they as is their pattern refuse to honor it and instead attempt to cheat and sue and lie about you? That is from my perspective 100% on you.
Yet some people still seem eager to get into business with them, time and again.
OpenAI says roughly to ‘look upon your investment as something akin to a donation.’ When you invest in Sam Altman, you are risking the world’s largest rug pull. If they never earn a dollar because they are fully serious about being a non-profit, and you will get no money and also no voice, then you better find a greater fool, or you lose. If instead Altman and OpenAI are all about the money, boys, that is good news for you until you are the one on the other end.
There is also the issue that cutting this work is not good business. If this was all merely ‘OpenAI was toxic to and lost its long term safety teams forcing others to do the work’ then, sure, from one angle that’s bad for the world but also good hard nosed business tactics. Instead, notice that Jan Leike warned that OpenAI is not ready for the next generation of models, meaning GPT-5, meaning this is likely an issue no later than 2025.
Ideas like weak-to-strong generalization are things I do not expect to work with GPT-9, but I do expect them to likely be highly useful for things like GPT-5. A wise man does not cut the ‘get the AI to do what you want it to do’ department when it is working on AIs it will soon have trouble controlling. When I put myself in ‘amoral investor’ mode, I notice this is not great, a concern that most of the actual amoral investors have not noticed.
My actual expectation is that for raising capital and doing business generally this makes very little difference. There are effects in both directions, but there was overwhelming demand for OpenAI equity already, and there will be so long as their technology continues to impress.
What about employee relations and ability to hire? Would you want to work for a company that is known to have done this? I know that I would not. What else might they be doing? What is the company culture like?