Zach Stein-Perlman

AI strategy & governance. ailabwatch.org. Looking for new projects.

Sequences

Slowing AI

Wiki Contributions

Load More

Comments

Now OpenAI publicly said "we're releasing former employees from existing nondisparagement obligations unless the nondisparagement provision was mutual." This seems to be self-effecting; by saying it, OpenAI made it true.

Hooray!

We know various people who've left OpenAI and might criticize it if they could. Either most of them will soon say they're free or we can infer that OpenAI was lying/misleading.

New Kelsey Piper article and twitter thread on OpenAI equity & non-disparagement.

It has lots of little things that make OpenAI look bad. It further confirms that OpenAI threatened to revoke equity unless employees signed the non-disparagement agreements Plus it shows Altman's signature on documents giving the company broad power over employees' equity — perhaps he doesn't read every document he signs, but this one seems quite important. This is all in tension with Altman's recent tweet that "vested equity is vested equity, full stop" and "i did not know this was happening." Plus "we have never clawed back anyone's vested equity, nor will we do that if people do not sign a separation agreement (or don't agree to a non-disparagement agreement)" is misleading given that they apparently regularly threatened to do so (or something equivalent — let the employee nominally keep their PPUs but disallow them from selling them) whenever an employee left.

Great news:

OpenAI told me that “we are identifying and reaching out to former employees who signed a standard exit agreement to make it clear that OpenAI has not and will not cancel their vested equity and releases them from nondisparagement obligations”

(Unless "employees who signed a standard exit agreement" is doing a lot of work — maybe a substantial number of employees technically signed nonstandard agreements.)

I hope to soon hear from various people that they have been freed from their nondisparagement obligations.


Update: OpenAI says:

As we shared with employees today, we are making important updates to our departure process. We have not and never will take away vested equity, even when people didn't sign the departure documents. We're removing nondisparagement clauses from our standard departure paperwork, and we're releasing former employees from existing nondisparagement obligations unless the nondisparagement provision was mutual. We'll communicate this message to former employees. We're incredibly sorry that we're only changing this language now; it doesn't reflect our values or the company we want to be.


[Low-effort post; might have missed something important.]

[Substantively edited after posting.]

50% I'll do this in the next two months if nobody else does. But not right now, and someone else should do it too.

Off the top of my head (this is not the list you asked for, just an outline):

  • Loopt stuff
  • YC stuff
  • YC removal
  • NDAs
    • And deceptive communication recently
    • And maybe OpenAI's general culture of don't publicly criticize OpenAI
  • Profit cap non-transparency
  • Superalignment compute
  • Two exoduses of safety people; negative stuff people-who-quit-OpenAI sometimes say
  • Telling board members not to talk to employees
  • Board crisis stuff
    • OpenAI executives telling the board Altman lies
    • The board saying Altman lies
    • Lying about why he wanted to remove Toner
    • Lying to try to remove Toner
    • Returning
    • Inadequate investigation + spinning results

Stuff not worth including:

  • Reddit stuff - unconfirmed
  • Financial conflict-of-interest stuff - murky and not super important
  • Misc instances of saying-what's-convenient (e.g. OpenAI should scale because of the prospect of compute overhang and the $7T chip investment thing) - idk, maybe, also interested in more examples
  • Johansson & Sky - not obvious that OpenAI did something bad, but it would be nice for OpenAI to say "we had plans for a Johansson voice and we dropped that when Johansson said no," but if that was true they'd have said it...

What am I missing? Am I including anything misleading or not-worth-it?

Quoting me from last time you said this:

The label "RSP" isn't perfect but it's kinda established now. My friends all call things like this "RSPs." . . . I predict change in terminology will happen ~iff it's attempted by METR or multiple frontier labs together. For now, I claim we should debate terminology occasionally but follow standard usage when trying to actually communicate.

Maybe setting up custom fine-tuning is hard and labs often only set it up during deployment...

(Separately, it would be nice if OpenAI and Anthropic let some safety researchers do fine-tuning now.)

I think Frontier Red Team is about eliciting model capabilities and Alignment Stress Testing is about "red-team[ing] Anthropic’s alignment techniques and evaluations, empirically demonstrating ways in which Anthropic’s alignment strategies could fail."

Thanks.

Any takes on what info a company could publish to demonstrate "the adequacy of its safety culture and governance"? (Or recommended reading?)

Ideally criteria are objectively evaluable / minimize illegible judgment calls.

Thanks.

Deployment mitigations level 2 discusses the need for mitigations on internal deployments.

Good point; this makes it clearer that "deployment" means external deployment by default. But level 2 only mentions "internal access of the critical capability," which sounds like it's about misuse — I'm more worried about AI scheming and escaping when the lab uses AIs internally to do AI development.

ML R&D will require thinking about internal deployments (and so will many of the other CCLs).

OK. I hope DeepMind does that thinking and makes appropriate commitments.

two-party control

Thanks. I'm pretty ignorant on this topic.

"every 3 months of fine-tuning progress" was meant to capture [during deployment] as well

Yayyy!

Thanks.

I'm glad to see that the non-compliance reporting policy has been implemented and includes anonymous reporting. I'm still hoping to see more details. (And I'm generally confused about why Anthropic doesn't share more details on policies like this — I fail to imagine a story about how sharing details could be bad, except that the details would be seen as weak and this would make Anthropic look bad.)

What details are you imagining would be helpful for you? Sharing the PDF of the formal policy document doesn't mean much compared to whether it's actually implemented and upheld and treated as a live option that we expect staff to consider (fwiw: it is, and I don't have a non-disparage agreement). On the other hand, sharing internal docs eats a bunch of time in reviewing it before release, chance that someone seizes on a misinterpretation and leaps to conclusions, and other costs.

Not sure. I can generally imagine a company publishing what Anthropic has published but having a weak/fake system in reality. Policy details do seem less important for non-compliance reporting than some other policies — Anthropic says it has an infohazard review policy, and I expect it's good, but I'm not confident, and for other companies I wouldn't necessarily expect that their policy is good (even if they say a formal policy exists), and seeing details (with sensitive bits redacted) would help.

I mostly take back my secret policy is strong evidence of bad policy insinuation — that's ~true on my home planet, but on Earth you don't get sufficient credit for sharing good policies and there's substantial negative EV from misunderstandings and adversarial interpretations, so I guess it's often correct to not share :(

As an 80/20 of publishing, maybe you could share a policy with an external auditor who would then publish whether they think it's good or have concerns. I would feel better if that happened all the time.

Load More