Ilya Sutskever and Jan Leike have resigned. They led OpenAI's alignment work. Superalignment will now be led by John Schulman, it seems. Jakub Pachocki replaced Sutskever as Chief Scientist.

Reasons are unclear (as usual when safety people leave OpenAI).

The NYT piece (archive) and others I've seen don't really have details.

OpenAI announced Sutskever's departure in a blogpost.

Sutskever and Leike confirmed their departures in tweets.


Friday May 17:

Superalignment dissolves.

Leike tweets, including:

I have been disagreeing with OpenAI leadership about the company's core priorities for quite some time, until we finally reached a breaking point.

I believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact, and related topics.

These problems are quite hard to get right, and I am concerned we aren't on a trajectory to get there.

Over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done.

Building smarter-than-human machines is an inherently dangerous endeavor. OpenAI is shouldering an enormous responsibility on behalf of all of humanity.

But over the past years, safety culture and processes have taken a backseat to shiny products.

Daniel Kokotajlo talks to Vox:

“I joined with substantial hope that OpenAI would rise to the occasion and behave more responsibly as they got closer to achieving AGI. It slowly became clear to many of us that this would not happen,” Kokotajlo told me. “I gradually lost trust in OpenAI leadership and their ability to responsibly handle AGI, so I quit.” 

Kelsey Piper says:

I have seen the extremely restrictive off-boarding agreement that contains nondisclosure and non-disparagement provisions former OpenAI employees are subject to. It forbids them, for the rest of their lives, from criticizing their former employer. Even acknowledging that the NDA exists is a violation of it.


TechCrunch says:

requests for . . . compute were often denied, blocking the [Superalignment] team from doing their work [according to someone on the team].

Piper is back:

OpenAI . . . says that going forward, they *won't* strip anyone of their equity for not signing the secret NDA.

(This is slightly good but OpenAI should free all past employees from their non-disparagement obligations.)

Saturday May 18:

OpenAI leaders Sam Altman and Greg Brockman tweet a response to Leike. It doesn't really say anything.

Separately, Altman tweets:

we have never clawed back anyone's vested equity, nor will we do that if people do not sign a separation agreement (or don't agree to a non-disparagement agreement). vested equity is vested equity, full stop.

there was a provision about potential equity cancellation in our previous exit docs; although we never clawed anything back, it should never have been something we had in any documents or communication. this is on me and one of the few times i've been genuinely embarrassed running openai; i did not know this was happening and i should have.

the team was already in the process of fixing the standard exit paperwork over the past month or so. if any former employee who signed one of those old agreements is worried about it, they can contact me and we'll fix that too. very sorry about this.

This seems to contradict various claims, including (1) OpenAI threatened to take all of your equity if you don't sign the non-disparagement agreement when you leave—the relevant question for evaluating OpenAI's transparency/integrity isn't whether OpenAI actually took people's equity, it's whether OpenAI threatened to—and (2) Daniel Kokotajlo gave up all of his equity. (Note: OpenAI equity isn't really equity, it's "PPUs," and I think the relevant question isn't whether you own the PPUs but rather whether you're allowed to sell them.)

No comment from OpenAI on freeing everyone from non-disparagement obligations.

It's surprising that Altman says he "did not know this was happening." I think Gwern and LW have been talking about this for a while. [Update: I failed to find this; I forget exactly why I feel like I was already aware of non-disparagement or where former OpenAI staff said "no comment" about such things.] Surely Altman knew that people leaving were signing non-disparagement agreements and would rather not... Oh, maybe he is talking narrowly about vested equity and OpenAI pseudo-equity is such that he's saying something technically true.

New Comment
95 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Added updates to the post. Updating it as stuff happens. Not paying much attention; feel free to DM me or comment with more stuff.


Reasons are unclear 

This is happening exactly 6 months after the November fiasco (the vote to remove Altman was on Nov 17th) which is likely what his notice period was, especially if he hasn't been in the office since then. 

Are the reasons really that unclear? The specifics of why he wanted Altman out might be, but he is ultimately clearly leaving because he didn't think Altman should be in charge, while Altman thinks otherwise.


But this is also right after GPT-4o, which, like Sora not that long ago, is a major triumph of the Sutskeverian vision of just scaling up sequence prediction for everything, and which OA has been researching for years (at least since CLIP/DALL-E 1, and possibly this effort for 2-3 years as 'Gobi'). I don't find it so hard to believe that he's held off until Sora & GPT-4o were out. These are the achievements of not just his lifetime, but hundreds of other peoples' lives (look at the contributor list). He's not going to quit anywhere before it. (Especially since by all accounts he's been gone the entire time, so what's a few more days or weeks of silence?)

Is there a particular reason to think that he would have had an exactly 6-month notice from the vote to remove Altman? And why would he have submitted notice then, exactly? The logical day to submit your quitting notice would be when the investigation report was submitted and was a complete Altman victory, which was not 6 months ago.

Pure speculation: The timing of these departures being the day after the big, attention-grabbing GPT-4o release makes me think that there was a fixed date for Ilya and Jan to leave, and OpenAI lined up the release and PR to drown out coverage. Especially in light of Ilya not (apparently) being very involved with GPT-4o.

4Arthur Malone
It also occurs to me that the causality could go the other way: Ilya and Jan may have timed their departure to coincide with the 4o release for a number of reasons. If they go on to launch a new safety org soon, for example, I'd be more inclined to think that the timing of the two events was a result of Ilya/Jan trying to use the moment to their advantage.

The 21st when Altman was reinstated, is a logical date for the resignation, and within a week of 6 months now which is why a notice period/agreement to wait ~half a year/something similar is the first thing I thought of, since obviously the ultimate reason why he is quitting is rooted in what happened around then.

Is there a particular reason to think that he would have had an exactly 6-month notice

You are right, there isn't, but 1, 3, 6 months is where I would have put the highest probability a priori.

Sora & GPT-4o were out.

Sora isn't out out, or at least not how 4o is out and Ilya isn't listed as a contributor in any form on it (compared to being an 'additional contributor' for gpt-4 or 'additional leadership' for gpt-4o) and in general, I doubt it had that much to do with the timing. 

GPT-4o of course, makes a lot of sense, timing-wise (it's literally the next day!) and he is listed on it (though not as one of the many contributors or leads). But if he wasn't in the office during that time (or is that just a rumor?) it's just not clear to me if he was actually participating in getting it out as his final project (which yes, is very plausible) or if he was just asked not to announce his departure until after the release, given that the two happen to be so close in time in that case.

2Jacob L
If a six month notice period was the key driver of the timing, I would have very much expected to see the departure announced slightly more than six months from the notable events, rather than (very) slightly less than six months before the notable events.  Given Ilya was voting in the majority on November 17th, seems unlikely he would have already resigned six months before the public announcement.
When considering that my thinking was that I'd expect the last day to be slightly after, but the announcement can be slightly before since that doesn't need to be quite on the last day but can and often would be a little before - e.g. be on the first day of his last week.
November 17 to May 16 is 180 days.   Pay periods often end on the 15th and end of the month, though at that level, I doubt that's relevant.
  Could you elaborate as to why you see GPT-4o as continuous with the scaling strategy? My understanding is this is a significantly smaller model than 4, designed to reduce latency and cost, which is then "compensated for" with multimodality and presumably many other improvements in architecture/data/etc. Isn't GPT-4o a clear break (temporary, I assume) with the singular focus on scaling of the past few years?
1Alex Mallen
You seem to be talking primarily about Ilya, but it seems especially unclear why Jan is also leaving now.
-7O O

That's good news.

There was a brief moment, back in 2023, when OpenAI's actions made me tentatively optimistic that the company was actually taking alignment seriously, even if its model of the problem was broken.

Everything that happened since then has made it clear that this is not the case; that all these big flashy commitments like Superalignment were just safety-washing and virtue signaling. They were only going to do alignment work inasmuch as that didn't interfere with racing full-speed towards greater capabilities.

So these resignations don't negatively impact my p(doom) in the obvious way. The alignment people at OpenAI were already powerless to do anything useful regarding changing the company direction.

On the other hand, what these resignations do is showcasing that fact. Inasmuch as Superalignment was a virtue-signaling move meant to paint OpenAI as caring deeply about AI Safety, so many people working on it resigning or getting fired starkly signals the opposite.

And it's good to have that more in the open; it's good that OpenAI loses its pretense.

Oh, and it's also good that OpenAI is losing talented engineers, of course.

[-]Wei Dai2115

So these resignations don’t negatively impact my p(doom) in the obvious way. The alignment people at OpenAI were already powerless to do anything useful regarding changing the company direction.

How were you already sure of this before the resignations actually happened? I of course had my own suspicions that this was the case, but was uncertain enough that the resignations are still a significant negative update.

ETA: Perhaps worth pointing out here that Geoffrey Irving recently left Google DeepMind to be Research Director at UK AISI, but seemingly on good terms (since Google DeepMind recently reaffirmed its intention to collaborate with UK AISI).

7Thane Ruthenis
OpenAI enthusiastically commercializing AI + the "Superalignment" approach being exactly the approach I'd expect someone doing safety-washing to pick + the November 2023 drama + the stated trillion-dollar plans to increase worldwide chip production (which are directly at odds with the way OpenAI previously framed its safety concerns). Some of the preceding resignations (chiefly, Daniel Kokotajlo's) also played a role here, though I didn't update off of them much either.
It's not clear to me that it was just safety-washing and virtue signaling. I think a better model is something like: there are competing factions within OAI that have different views, that have different interests, and that, as a result, prioritize scaling/productization/safety/etc. to varying degrees. Superalignment likely happened because (a) the safety faction (Ilya/Jan/etc.) wanted it, and (b) the Sam faction also wanted it, or tolerated it, or agreed to it due to perceived PR benefits (safety-washing), or let it happen as a result of internal negotiation/compromise, or something else, or some combination of these things. If OAI as a whole was really only doing anything safety-adjacent for pure PR or virtue signaling reasons, I think its activities would have looked pretty different. For one, it probably would have focused much more on appeasing policymakers than on appeasing the median LessWrong user. (The typical policymaker doesn't care about the superalignment effort, and likely hasn't even heard of it.) It would also not be publishing niche (and good!) policy/governance research. Instead, it would probably spend that money on actual PR (e.g., marketing campaigns) and lobbying. I do think OAI has been tending more in that direction (that is, in the direction of safety-washing, and/or in the direction of just doing less safety stuff period). But it doesn't seem to me like it was predestined. I.e., I don't think it was "only going to do alignment work inasmuch as that didn't interfere with racing full-speed towards greater capabilities". Rather, it looks to me like things have tended that way as a result of external incentives (e.g., looming profit, Microsoft) and internal politics (in particular, the safety faction losing power). Things could have gone quite differently, especially if the board battle had turned out differently. Things could still change, the trend could still reverse, even though that seems improbable right now.
9Thane Ruthenis
Sure, that's basically my model as well. But if the faction (b) only cares about alignment due to perceived PR benefits or in order to appease faction (a), and faction (b) turns out to have overriding power such that it can destroy or drive out faction (a) and then curtail all the alignment efforts, I think it's fair to compress all that into "OpenAI's alignment efforts are safety-washing". If (b) has the real power within OpenAI, then OpenAI's behavior and values can be approximately rounded off to (b)'s behavior and values, and (a) is a rounding error. Not if (b) is concerned about fortifying OpenAI against future challenges, such as hypothetical futures in which the AGI Doomsayers get their way and the government/the general public wakes up and tries to nationalize or ban AGI research. In that case, having a prepared, well-documented narrative of going above and beyond to ensure that their products are safe, well before any other parties woke up to the threat, will ensure that OpenAI is much more well-positioned to retain control over its research. (I interpret Sam Altman's behavior at Congress as evidence for this kind of longer-term thinking. He didn't try to downplay the dangers of AI, which would be easy and what someone myopically optimizing for short-term PR would do. He proactively brought up the concerns that future AI progress might awaken, getting ahead of it, and thereby established OpenAI as taking them seriously and put himself into the position to control/manage these concerns.) And it's approximately what I would do, at least, if I were in charge of OpenAI and had a different model of AGI Ruin. And this is the potential plot whose partial failure I'm currently celebrating.

Note that the NYT article is by Cade Metz.

I am lacking context, why is this important?


Cade Metz was the NYT journalist who doxxed Scott Alexander. IMO he has also displayed a somewhat questionable understanding of journalistic competence and integrity, and seems to be quite into narrativizing things in a weirdly adversarial way (I don't think it's obvious how this applies to this article, but it seems useful to know when modeling the trustworthiness of the article).


This, and see also Gwern's comment here.

My comment here is not cosmically important and I may delete it if it derails the conversation. There are times when I would really want a friend to tap me on the shoulder and say "hey, from the outside the way you talk about <X> seems way worse than normal.  Are you hungry/tired/too emotionally close?".  They may be wrong, but often they're right. If you (general reader you) would deeply want someone to tap you on the shoulder, read on, otherwise this comment isn't for you. If you burn at NYT/Cade Metz intolerable hostile garbage, are you have not taken into account how defensive tribal instincts can cloud judgements, then, um <tap tap>?

FWIW, Cade Metz was reaching out to MIRI and some other folks in the x-risk space back in January 2020, and I went to read some of his articles and came to the conclusion in January that he's one of the least competent journalists -- like, most likely to misunderstand his beat and emit obvious howlers -- that I'd ever encountered. I told folks as much at the time, and advised against talking to him just on the basis that a lot of his journalism is comically bad and you'll risk looking foolish if you tap him.

This was six months before Metz caused SSC to shut down and more than a year before his hit piece on Scott came out, so it wasn't in any way based on 'Metz has been mean to my friends' or anything like that. (At the time he wasn't even asking around about SSC or Scott, AFAIK.)

(I don't think this is an idiosyncratic opinion of mine, either; I've seen other non-rationalists I take seriously flag Metz as someone unusually out of his depth and error-prone for a NYT reporter, for reporting unrelated to SSC stuff.)

I think it is useful for someone to tap me on the shoulder and say "Hey, this information you are consuming, its from <this source that you don't entirely trust and have a complex causal model of>".

Enforcing social norms to prevent scapegoating also destroys information that is valuable for accurate credit assignment and causally modelling reality. I haven't yet found a third alternative, and until then, I'd recommend people both encourage and help people in their community to not scapegoat or lose their minds in 'tribal instincts' (as you put it), while not throwing away valuable information.

You can care about people while also seeing their flaws and noticing how they are hurting you and others you care about.

I really appreciate this comment! And yeah, that's why I said only "Note that...", and not something like "don't trust this guy". I think the content of the article is probably true, and maybe it's Metz who wrote it just because AI is his beat. But I do also hold tiny models that say "maybe he dislikes us" and also something about the "questionable understanding" etc that habryka mentions below. AFAICT I'm not internally seething or anything, I just have a yellow-flag attached to this name.

FWIW I do think "don't trust this guy" is warranted; I don't know that he's malicious, but I think he's just exceptionally incompetent relative to the average tech reporter you're likely to see stories from.

Like, in 2018 Metz wrote a full-length article on smarter-than-human AI that included the following frankly incredible sentence:

During a recent Tesla earnings call, Mr. Musk, who has struggled with questions about his company’s financial losses and concerns about the quality of its vehicles, chastised the news media for not focusing on the deaths that autonomous technology could prevent — a remarkable stance from someone who has repeatedly warned the world that A.I. is a danger to humanity.


So out of the twelve people on the weak to strong generalization paper, four have since left OpenAI? (Leopald, Pavel, Jan, and Ilya)

Other recent safety related departures that come to mind are Daniel Kokotajlo and William Saunders.

Am I missing anyone else?

Cullen O'Keefe also no longer at OpenAI (as of last month)

Two other executives left two weeks ago, but that's not obviously safety-related.

6Nathan Helm-Burger
 the executives—  Diane Yoon: vice president of people  Chris Clark: head of nonprofit and strategic initiatives left the company earlier this week, a company spokesperson said.
Evan Morikawa?
2Eric Neyman
My Manifold market on Collin Burns, lead author of the weak-to-strong generalization paper

After returning to OpenAI just five days after he was ousted, Mr. Altman reasserted his control and continued its push toward increasingly powerful technologies that worried some of his critics. Dr. Sutskever remained an OpenAI employee, but he never returned to work.

Had this been known until now? I didn't know he "didn't get back to work" although admittedly I wasn't tracking the issue very closely.

Yeah, it's been a bit of a meme ("where is Ilya?"). See e.g. Gwern's comment thread here.

7Mateusz Bagiński
Yeah, that meme did reach me. But I was just assuming Ilya got back (was told to get back) to doing the usual Ilya superalignment things and decided (was told) not to stick his neck out.
He might have returned to work, but agreed to no external coms.

Interesting to watch Sam Altman talk about it here at timestamp 18:40: 

Notably, this interview was on March 18th, and afaik the highest-level interview Altman has had to give his two cents since the incident. There's a transcript here. (There was also this podcast a couple days ago). I think a Dwarkesh-Altman podcast would be more likely to arrive at more substance from Altman's side of the story. I'm currently pretty confident that Dwarkesh and Altman are sufficiently competent to build enough trust to make sane and adequate pre-podcast agreements (e.g. don't be an idiot who plays tons of one-shot games just because podcast cultural norms are more vivid in your mind than game theory), but I might be wrong about this; trailblazing the frontier of making-things-happen, like Dwarkesh and Altman are, is a lot harder than thinking about the frontier of making-things-happen.
9Seth Herd
There's also this podcast from just yesterday. It's really good. Sam continues to say all the right things; in fact, I think this is the most reassuring he's ever been on taking the societal risks seriously, if not necessarily the existential risks. Which leaves me baffled. He's certainly a skilled enough social engineer to lie convincingly, but he sounds so dang sincere. I'm weakly concluding for the moment that he just doesn't think the alignment problem is that hard. I think that's wrong, but the matter is curiously murky, so it's not necessarily an irrational opinion to hold. Getting more meaningful discussion between optimistic and pessimistic alignment experts would help close that gap.

He has a stance towards risk that is a necessary condition for becoming the CEO of a company like OpenAI, but doesn't give you a high probability of building a safe ASI:

There's a couple problems that even though Sam is surrounded by people like Ilya and Jan warning him of the consequences, he's currently unwilling to change course. 1. Staring down AGI ruin is just fundamentally a deeply scary abyss. Doubly so when it would've been Sam himself at fault. 2. Like many political leaders, he's unable to let go of power because he believes that he himself is most capable of wielding it, even when it causes them to take actions that are worse than either leader would do independently if they weren't trying to wrest power away from the other. 3. More than most, Sam envisions the upside. He's a vegetarian and cares a lot about animal suffering ("someday eating meat is going to get cancelled, and we are going to look back upon it with horror. we have a collective continual duty to find our way to better morals"), seeing the power of AGI to stop that suffering. He talks up the curing cancer and objectively good tech progress a lot. He probably sees it as: heads, I live forever and am the greatest person for bringing utopia, tails, I'm dead anyway like I would be in 50 years. Ultimately, I think Sama sees the coin flip (but believing utopia AGI is higher chance) and being the ever-optimist, is willing to flip it for the good things because it's too scary to believe that the abyss side would happen.

Actually, as far as I know, this is wrong. He simply hasn’t been back to the offices but has been working remotely.

This article goes into some detail and seems quite good.

Guaranteeing all the safety people that left OpenAI that any legal fees for breaking their NDA would be fully compensated might be a very effective intervention.

On first order, this might have a good effect on safety.

On second order, it might have negative effects, because it increases the risk of and therefor lowers the rate of such companies hiring people openly worrying about AI X-Risk.

On third order, people who openly worry about X-Risk may get influenced by their environment, becoming less worried as a result of staying with a company whose culture denies X-Risk, which could eventually even cause them to contribute negatively to AI Safety. Preventing them from getting hired prevents this.

In my opinion, a class action filed by all employees allegedly prejudiced (I say allegedly here, reserving the right to change 'prejudiced' in the event that new information arises) by the NDAs and gag orders would be very effective.

Were they to seek termination of these agreements on the basis of public interest in an arbitral tribunal, rather than a court or internal bargaining, the ex-employees are far more likely to get compensation. The litigation costs of legal practitioners there also tend to be far less.

Again, this assumes that the agreements they signed didn't also waive the right to class action arbitration. If OpenAI does have agreements this cumbersome, I am worried about the ethics of everything else they are pursuing.

For further context, see:

Even acknowledging that the NDA exists is a violation of it.

This sticks out pretty sharply to me.

Was this explained to the employees during the hiring process? What kind of precedent is there for this kind of NDA? 


Was this explained to the employees during the hiring process? What kind of precedent is there for this kind of NDA?

See Kelsey's follow-up reporting on this.

Thanks for the source.

I've intentionally made it difficult for myself to log into twitter. For the benefit of others who avoid Twitter, here is the text of Kelsey's tweet thread:

I'm getting two reactions to my piece about OpenAI's departure agreements: "that's normal!" (it is not; the other leading AI labs do not have similar policies) and "how is that legal?" It may not hold up in court, but here's how it works:

OpenAI like most tech companies does salaries as a mix of equity and base salary. The equity is in the form of PPUs, 'Profit Participation Units'. You can look at a recent OpenAI offer and an explanation of PPUs here:

Many people at OpenAI get more of their compensation from PPUs than from base salary. PPUs can only be sold at tender offers hosted by the company. When you join OpenAI, you sign onboarding paperwork laying all of this out.

And that onboarding paperwork says you have to sign termination paperwork with a 'general release' within sixty days of departing the company. If you don't do it within 60 days, your units are cancelled. No one I spoke to at OpenAI gave this little line much thought.

And yes this is talking about vested units, because a

... (read more)
Thanks, but this doesn't really give insight on whether this is normal or enforceable. So I wanted to point out, we don't know if it's enforcible, and have not seen a single legal opinion.
I am not a lawyer, and my only knowledge of this agreement comes from the quote above, but...if the onboarding paperwork says you need to sign "a" general release, but doesn't describe the actual terms of that general release, then it's hard for me to see an interpretation that isn't either toothless or crazy: 1. If you interpret it to mean that OpenAI can write up a "general release" with absolutely any terms they like, and you have to sign that or lose your PPUs, then that seems like it effectively means you only keep your PPUs at their sufferance, because they could simply make the terms unconscionable.  (In general, any clause that requires you to agree to "something" in the future without specifying the terms of that future agreement is a blank check.) 2. If you interpret it to mean either that the employee can choose the exact terms, or that the terms must be the bare minimum that would meet the legal definition of "a general release", then that sounds like OpenAI has no actual power to force the non-disclosure or non-disparagement terms--although they could very plausibly trick employees into thinking they do, and threaten them with costly legal action if they resist.  (And once the employee has fallen for the trick and signed the NDA, the NDA itself might be enforceable?) 3. Where else are the exact terms of the "general release" going to come from, if they weren't specified in advance and neither party has the right to choose them?

In case people missed this, another safety researcher recently left OpenAI: Ryan Lowe.

I don't know Ryan's situation, but he was a "research manager working on AI alignment."

Noting that while Sam describes the provision as being about “about potential equity cancellation”, the actual wording says ‘shall be cancelled’ not ‘may be cancelled’, as per this tweet from Kelsey Piper:


It'll be interesting to see if OpenAI will keep going with their compute commitments now that the two main superalignment leads have left. 

The commitment—"20% of the compute we've secured to date" (in July 2023), to be used "over the next four years"—may be quite little in 2027, with compute use increasing exponentially. I'm confused about why people think it's a big commitment.


It seems like it was a big commitment because there were several hints during the OpenAI coup reporting that Superalignment was not getting the quota as OA ran very short on compute in 2023, creating major internal stress (particularly from Sam Altman telling people different things or assigning the same job) and that was one of the reasons for Altman sidelining Ilya Sutskever in favor of Jakub Pachocki. What sounded good & everyone loved initially turned out to be a bit painful to realize. (Sort of like designing the OA LLC so the OA nonprofit board could fire the CEO.)

EDIT: speak of the devil: Note Leike has to be very cautious in his wording. EDITEDIT: and further confirmed as predating the coup and thus almost certainly contributing to why Ilya flipped.

I agree it's not a large commitment in some absolute sense. I think it'd still be instructive to see whether they're able to hit this (not very high) bar.
Instead of the compute, can we have some extra time instead, i.e., a pause in capabilities research?

It is hard to pinpoint motivation here. If you are a top researcher at a top lab working on alignment and you disagree with something within the company, I see two categories of options you can take to try to fix things

  • Stay and try to use your position of power to do good. Better that someone who deeply cares about AI risk is in charge than someone who doesn't
  • Leave in protest to try to sway public opinion into thinking that your organization is unsafe and that we should not trust it

Jan and Ilya left but haven't said much about how they lost confidence in OpenAI. I expect we will see them making more damning statements about OpenAI in the future

Or is there a possible motivation I'm missing here?

It seems likely (though not certain) that they signed non-disparagement agreements, so we may not see more damning statements from them even if that's how they feel. Also, Ilya at least said some positive things in his leaving announcement, so that indicates either that he caved in to pressure (or too high agreeableness towards former co-workers) or that he's genuinely not particularly worried about the direction of the company and that he left more because of reasons related to his new project. 

Someone serious about alignment seeing dangers better do what is save and not be influenced by a non-disparagement agreement. It might lose them some job prospects and have money and possible lawsuit costs, but if history on earth is on the line? Especially since such a known AI genius would find plenty support from people who supported such open move.

So I hope he assumes talking right NOW it not considered strategically worth it. E.g. He might want to increase his chance to be hired by semi safety serious company (more serious than Open AI, but not enough to hire a proven whistleblower), where he can use his position better.

I agree with what you say in the first paragraph. If you're talking about Ilya, which I think you are, I can see what you mean in the second paragraph, but I'd flag that even if he had some sort of plan here, it seems pretty costly and also just bad norms for someone with his credibility to say something that indicates that he thinks OpenAI is on track to do well at handling their great responsibility, assuming he were to not actually believe this. It's one thing to not say negative things explicitly; it's a different thing to say something positive that rules out the negative interpretations. I tend to take people at their word if they say things explicitly, even if I can assume that they were facing various pressures. If I were to assume that Ilya is saying positive things that he doesn't actually believe, that wouldn't reflect well on him, IMO. 

If we consider Jan Leike's situation, I think what you're saying applies more easily, because him leaving without comment already reflects poorly on OpenAI's standing on safety, and maybe he just decided that saying something explicitly doesn't really add a ton of information (esp. since maybe there are other people who might be in a... (read more)

4Charlie Steiner
Well, one big reason is if they were prevented from doing the things they thought would constitute using their position of power to do good, or were otherwise made to feel that OpenAI wasn't a good environment for them.
Leaving to dissuade others within the company is another possibility
I assume they can't make a statement and that their choice of next occupation will be the clearest signal they can and will send out to the public.

How many safety-focused people have left since the board drama now? I count 7, but I might be missing more. Ilya Sutskever, Jan Leike, Daniel Kokotajlo, Leopold Aschenbrenner, Cullen O'Keefe, Pavel Izmailov, William Saunders.

This is a big deal. A bunch of the voices that could raise safety concerns at OpenAI when things really heat up are now gone. Idk what happened behind the scenes, but they judged now is a good time to leave.

Possible effective intervention: Guaranteeing that if these people break their NDA's, all their legal fees will be compensated for. No idea how sensible this is, so agree/disagree voting encouraged.


Ilya departure is momentous.

What do we know about those other departures? The NYT article has this:

Jan Leike, who ran the Super Alignment team alongside Dr. Sutskever, has also resigned from OpenAI. His role will be taken by John Schulman, another company co-founder.

I have not been able to find any other traces of this information yet.

We do know that Pavel Izmailov has joined xAI:

Leopold Aschenbrenner still lists OpenAI as his affiliation everywhere I see. The only recent traces of his activity seem to be likes on Twitter:


Jan Leike confirms:

Dwarkesh is supposed to release his podcast with John Schulman today, so we can evaluate the quality of his thinking more closely (he is mostly known for reinforcement learning,, although he has some track record of safety-related publications, including Unsolved Problems in ML Safety, 2021-2022, and Let's Verify Step by Step, which includes Jan Leike and Ilya Sutskever among its co-authors).

No confirmation of him becoming the new head of Superalignment yet...


The podcast is here:

From reading the first 29 min of the transcript, my impression is: he is strong enough to lead an org to an AGI (it seems many people are strong enough to do this from our current level, the conversation does seem to show that we are pretty close), but I don't get the feeling that he is strong enough to deal with issues related to AI existential safety. At least, that's what my initial impression is :-(


This interview was terrifying to me (and I think to Dwarkesh as well), Schulman continually demonstrates that he hasn't really thought about the AGI future scenarios in that much depth and sort of handwaves away any talk of future dangers. 

Right off the bat he acknowledges that they reasonably expect AGI in 1-5 years or so, and even though Dwarkesh pushes him he doesn't present any more detailed plan for safety than "Oh we'll need to be careful and cooperate with the other companies...I guess..."

Here is my coverage of it. Given this is a 'day minus one' interview of someone in a different position, and given everything else we already know about OpenAI, I thought this went about as well as it could have. I don't want to see false confidence in that kind of spot, and the failure of OpenAI to have a plan for that scenario is not news.

I have so much more confidence in Jan and Ilya. Hopefully they go somewhere to work on AI alignment together. The critical time seems likely to be soon. See this clip from an interview with Jan: 

[Edit: watched the full interview with John and Dwarkesh. John seems kinda nervous, caught a bit unprepared to answer questions about how OpenAI might work on alignment. Most of the interesting thoughts he put forward for future work were about capabilities. Hopefully he does delve deeper into alignment work if he's going to remain in charge of it at OpenAI.]

He removed the mention of xAI, and listed Anthropic as his next job (in all his platforms). His CV says that he is doing Scalable Oversight at Anthropic His LinkedIn also states the end of his OpenAI employment as May 2024: His twitter does say "ex-superalignment" now:
Leopold and Pavel were out ("fired for allegedly leaking information") in April.

Edit: nevermind; maybe this tweet is misleading and narrow and just about restoring people's vested equity; I'm not sure what that means in the context of OpenAI's pseudo-equity but possibly this tweet isn't a big commitment.

@gwern I'm interested in your take on this new Altman tweet:

we have never clawed back anyone's vested equity, nor will we do that if people do not sign a separation agreement (or don't agree to a non-disparagement agreement). vested equity is vested equity, full stop.

there was a provision about potential equity cancellation in our previous exit docs; although we never clawed anything back, it should never have been something we had in any documents or communication. this is on me and one of the few times i've been genuinely embarrassed running openai; i did not know this was happening and i should have.

the team was already in the process of fixing the standard exit paperwork over the past month or so. if any former employee who signed one of those old agreements is worried about it, they can contact me and we'll fix that too. very sorry about this.

In particular "i did not know this was happening"

What do you mean by pseudo-equity?
OpenAI has something called PPUs ("Profit Participation Units") which in theory is supposed to act like RSUs albeit with a capped profit and no voting rights, but in practice is entirely a new legal invention and we don't really know how it works.
Is that not what Altman is referring to when he talks about vested equity? My understanding was employees had no other form of equity besides PPUs, in which case he’s talking non-misleadingly about the non-narrow case of vested PPUs, ie the thing people were alarmed about, right?
2James Payor
It may be that talking about "vested equity" is avoiding some lie that would occur if he made the same claim about the PPUs. If he did mean to include the PPUs as "vested equity" presumably he or a spokesperson could clarify, but I somehow doubt they will.
I'd be a bit surprised if that's the answer, if OpenAI doesn't offer any vested equity, that half-truth feels overly blatant to me. 
3James Payor
Fwiw I will also be a bit surprised, because yeah. My thought is that the strategy Sam uses with stuff is to only invoke the half-truth if it becomes necessary later. Then he can claim points for candor if he doesn't go down that route. This is why I suspect (50%) that they will avoid clarifying that he means PPUs, and that they also won't state that they will not try to stop ex-employees from exercising them, and etc. (Because it's advantageous to leave those paths open and to avoid having clearly lied in those scenarios.) I think of this as a pattern with Sam, e.g. "We are not training GPT-5" at the MIT talk and senate hearings, which turns out was optimized to mislead and got no further clarification iirc. There is a mitigating factor in this case which is that any threat to equity lights a fire under OpenAI staff, which I think is a good part of the reason that Sam responded so quickly.
3James Payor
Okay I guess the half-truth is more like this:

Putting aside the fact that OpenAI drama seems to always happen in a world-is-watching fishbowl, this feels very much like the pedestrian trope of genius CTO getting sidelined as the product succeeds and business people pushing business interests take control. On his own, Ilya can raise money for anything he wants, hire anyone he wants, and basically just have way more freedom than he does at OpenAI.

I do think there is a basic p/doom vs e/acc divide which has probably been there all along, but as the tech keeps accelerating it becomes more and more of a sticking point.

I suspect in the depths of their souls, SA and Brock and the rest of that crowd do not really take the idea of existential threat to humanity seriously. Giving Ilya a "Safety and alignment" role probably now looks like a sop to A) shut the p-doomers up and B) signal some level of concern. But when push comes to shove, SA and team do what they know how to do -- push product out the door. Move fast and risk extinction.

One CEO I worked with summed up his attitude thusly: "Ready... FIRE! - aim."

Without resorting to exotic conspiracy theories, is it that unlikely to assume that Altman et al. are under tremendous pressure from the military and  intelligence agencies to produce results to not let China or anyone else win the race for AGI? I do not for a second believe that Altman et al. are reckless idiots that do not understand what kind of fire they might be playing with, that they would risk wiping out humanity just to beat Google on search. There must be bigger forces at play here, because that is the only thing that makes sense when reading Leike's comment and observing Open AI's behavior.

I'm out of the loop. Did Daniel Kokotajlo lose his equity or not? If the NDA is not being enforced, are there now some disclosures being made?

Organizational structure is an alignment mechanism. 

While I sympathize with the stated intentions, I just can't wrap my head around the naivety.  OpenAI corporate structure was a recipe for bad corporate governance. "We are the good guys here, the structure is needed to make others align with us."- an organization where ethical people can rule as benevolent dictators is the same mistake committed socialists made when they had power.  

If it was that easy, AI alignment would be solved by creating ethical AI commit... (read more)

For Jan Leike to leave OpenAI I assume there must be something bad happening internally and/or he got a very good job offer elsewhere.

I find it hard to imagine a job offer that Jan Leike judged more attractive than OA superalignment. (unless he made an update similar to Kokotajlo's or something?)

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

were fucked

I love the score of this comment as of writing: -1 karma points, 23 agree points.

Actually a great example of people using the voting system right. It does not contribute anything substantial to the conversation, but just express something most of us feel obviously.

I had to order the 2 votes into the 4 prototypes to makes sure I voted sensibly:

High Karma - Agree: A well expressed opinion I deeply share

High Karma - Disagree: A well argued counterpoint that I would never use myself / It did not convince me.

Low Karma - Agree: Something obvious/trivial/repeated that I agree with, but not worth saying here.

Low Karma - Disagree: low quality rest bucket

Also pure factual statement contribution (helpful links, context etc.) should get Karma votes only, as no opinion to disagree with is expressed.