Non-Disparagement Canaries for OpenAI

Adam Scholl

To be more explicit, I’m not under any nondisparagement agreement, nor have I ever been. I left OpenAI prior to my cliff, and have never had any vested OpenAI equity.

I am under a way more normal and time-bounded nonsolicit clause with Alphabet.

[-]Beth Barnes2y*12110

I signed the secret general release containing the non-disparagement clause when I left OpenAI. From more recent legal advice I understand that the whole agreement is unlikely to be enforceable, especially a strict interpretation of the non-disparagement clause like in this post. IIRC at the time I assumed that such an interpretation (e.g. where OpenAI could sue me for damages for saying some true/reasonable thing) was so absurd that couldn't possibly be what it meant.
^[1]
I sold all my OpenAI equity last year, to minimize real or perceived CoI with METR's work. I'm pretty sure it never occurred to me that OAI could claw back my equity or prevent me from selling it. ^[2]

OpenAI recently informally notified me by email that they would release me from the non-disparagement and non-solicitation provisions in the general release (but not, as in some other cases, the entire agreement.) They also said OAI "does not intend to enforce" these provisions in other documents I have signed. It is unclear what the legal status of this email is given that the original agreement states it can only be modified in writing signed by both parties.

As far as I can recall, concern about financial penalties for violating non-disparagement provisions was never a consideration that affected my decisions. I think having signed the agreement probably had some effect, but more like via "I want to have a reputation for abiding by things I signed so that e.g. labs can trust me with confidential information". And I still assumed that it didn't cover reasonable/factual criticism.

That being said, I do think many researchers and lab employees, myself included, have felt restricted from honestly sharing their criticisms of labs beyond small numbers of trusted people. In my experience, I think the biggest forces pushing against more safety-related criticism of labs are:

(1) confidentiality agreements (any criticism based on something you observed internally would be prohibited by non-disclosure agreements - so the disparagement clause is only relevant in cases where you're criticizing based on publicly available information)
(2) labs' informal/soft/not legally-derived powers (ranging from "being a bit less excited to collaborate on research" or "stricter about enforcing confidentiality policies with you" to "firing or otherwise making life harder for your colleagues or collaborators" or "lying to other employees about your bad conduct" etc)
(3) general desire to be researchers / neutral experts rather than an advocacy group.

To state what is probably obvious: I don't think labs should have non-disparagement provisions. I think they should have very clear protections for employees who wanted to report safety concerns, including if this requires disclosing confidential information. I think something like the asks here are a reasonable start, and I also like Paul's idea (which I can't now find the link for) of having labs make specific "underlined statements" to which employees can anonymously add caveats or contradictions that will be publicly displayed alongside the statements. I think this would be especially appropriate for commitments about red lines for halting development (e.g. Responsible Scaling Policies) - a statement that a lab will "pause development at capability level x until they have implemented mitigation y" is an excellent candidate for an underlined statement

^{^}
Regardless of legal enforceability, it also seems like it would be totally against OpenAI's interests to sue someone for making some reasonable safety-related criticism.
^{^}
I would have sold sooner but there are only intermittent opportunities for sale. OpenAI did not allow me to donate it, put it in a DAF, or gift it to another employee. This maybe makes more sense given what we know now. In lieu of actually being able to sell, I made a legally binding pledge in Sep 2021 to donate 80% of any OAI equity.

[-]Lukas Finnveden2y*120

I also like Paul's idea (which I can't now find the link for) of having labs make specific "underlined statements" to which employees can anonymously add caveats or contradictions that will be publicly displayed alongside the statements

Link: https://sideways-view.com/2018/02/01/honest-organizations/

[-]Beth Barnes2y40

Also FWIW I'm very confident Chris Painter has never been under any non-disparagement obligation to OpenAI

[-]JeremySchlatter2y11517

There is a factor that may be causing people who have been released to not report it publicly:

When I received the email from OpenAI HR releasing me from the non-disparagement agreement, I wanted to publicly acknowledge that fact. But then I noticed that, awkwardly, I was still bound not to acknowledge that it had existed in the first place. So I didn't think I could say, for example, "OpenAI released me from my non-disparagement agreement" or "I used to be bound by a non-disparagement agreement, but now I'm not."

So I didn't say anything about it publicly. Instead, I replied to HR asking for permission to disclose the previous non-disparagement agreement. Thankfully they gave it to me, which is why I'm happy to talk about it now. But if I hadn't taken the initiative to email them I would have been more hesitant to reveal that I had been released from the non-disparagement agreement.

I don't know if any other ex-OpenAI employees are holding back for similar reasons. I may have been unusually cautious or pedantic about this. But it seemed worth mentioning in case I'm not the only one.

[-]William_S2y128

Language in the emails included:

"If you executed the Agreement, we write to notify you that OpenAI does not intend to enforce the Agreement"

I assume this also communicates that OpenAI doesn't intend to enforce the self-confidentiality clause in the agreement

[-]JeremySchlatter2y132

Oh, interesting. Thanks for pointing that out! It looks like my comment above may not apply to post-2019 employees.

(I was employed in 2017, when OpenAI was still just a non-profit. So I had no equity and therefore there was no language in my exit agreement that threatened to take my equity. The equity-threatening stuff only applies to post-2019 employees, and their release emails were correspondingly different.)

The language in my email was different. It released me from non-disparagement and non-solicitation, but nothing else:

"OpenAI writes to notify you that it is releasing you from any non-disparagement and non-solicitation provision within any such agreement."

[-]simeon_c2y4352

Great initiative! Thanks for leading the charge on this.

[-]chris_painter2y310

I have never owned equity in OpenAI, and have never to my knowledge been in any nondisparagement agreement with OpenAI

[-]Neel Nanda2y3115

Geoffrey Irving (Research Director, AI Safety Institute)

Given the tweet thread Geoffrey wrote during the board drama, it seems pretty clear that he's willing to publicly disparage OpenAI. (I used to work with Geoffrey, but have no private info here)

[-]Adam Scholl2y2917

I agree, but I think it still matters whether or not he's bound by the actual agreement. One might imagine that he's carefully pushing the edge of what he thinks he can get away with saying, for example, in which case he may still not be fully free to speak his mind. And since I would much prefer to live in a world where he is, I'm wary of prematurely concluding otherwise without clear evidence.

[-]Neel Nanda2y52

Fair point

[-]Geoffrey Irving2y*349

I endorse Neel’s argument.

(Also see more explicit comment above, apologies for trying to be cute. I do think I have already presented extensive evidence here.)

[-]anon_standards2y*252

[-]habryka2y301

What's the source of this? Will also DM you.

[-]Nick_Tarleton2y258

(I am not a lawyer)

The usual argument (e.g.) for warrant canaries being meaningful is that the (US) government has much less legal ability to compel speech (especially false speech) than to prohibit it. I don't think any similar argument holds for private contracts; AFAIK they can require speech, and I don't know whether anything is different if the required speech is known by both parties to be false. (The one relevant search result I found doesn't say there's anything preventing such a contract; Claude says there isn't, but it could be thrown out on grounds of public policy or unconscionability.)

I would think this 'canary' still works, because it's hard to imagine OpenAI suing, or getting anywhere with a suit, for someone not proactively lying (when silence could mean things besides 'I am subject to an NDA'). But, if a contract requiring false speech would be valid,

insofar as this works it works for different reasons than a warrant canary
it could stop working, if future NDAs are written with it in mind

(Quibbles aside, this is a good idea; thanks for making it!)

[-]Adam Scholl2y95

Yeah, the proposal here differs from warrant canaries in that it doesn't ask people to proactively make statements ahead of time—it just relies on the ability of some people who can speak, to provide evidence that others can't. So if e.g. Bob and Joe have been released, but Alice hasn't, then Bob and Joe saying they've been released makes Alice's silence more conspicuous.

[-]Orpheus162y117

Minor note: Paul is at the US AI Safety Institute, while Jade & Geoffrey are at the UK AI Safety Institute.

[-]Adam Scholl2y62

Thanks! Edited to fix.

[-]Garrett Baker2y90

A market on the subject: https://manifold.markets/GarrettBaker/which-of-the-names-below-will-i-rec?r=R2FycmV0dEJha2Vy

[-]Mateusz Bagiński2y60

In response, some companies began listing warrant canaries on their websites—sentences stating that they had never yet been forced to reveal any client data. If at some point they did receive such a warrant, they could then remove the canary without violating their legal non-disclosure obligation, thereby allowing the public to gain indirect evidence about this otherwise-invisible surveillance.

Can the gov force them not to remove the canary?

[-]kilotaras2y130

Can the gov force them not to remove the canary?

In theory this can be circumvented by regularly publishing "as of MM YYYY i have not received a warrant from government".

In practice - we would not know if someone was forced by government to produce a particular canary. Government that can force someone to lie would likely be also able to force someone not to disclose they were forced.

Another thing government can do (and in case of Australia did) is to prohibit canaries in some cases alltogether by making it illegal to "disclose information about the existence or non-existence".

[-]Review Bot2y*31

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

[-]Tamsin Leake2y3-1

sigh I wish people realized how useless it is to have money when the singularity happens. Either we die or we get a utopia in which it's pretty unlikely that pre-singularity wealth matters. What you want to maximize is not your wealth but your utility function, and you sure as hell are gonna get more from LDT handshakes with aligned superintelligences in saved worlds, if you don't help OpenAI reduce the amount of saved worlds.

[-]TsviBT2y2217

I wish you would realize that whatever we're looking at, it isn't people not realizing this.

[-]Rob Bensinger2y22

[-]TsviBT2y2313

I'm interpreting "realize" colloquially, as in, "be aware of". I don't think the people discussed in the post just haven't had it occur to them that pre-singularity wealth doesn't matter because a win singularity society very likely wouldn't care much about it. Instead someone might, for example...

...care a lot about their and their people's lives in the next few decades.
...view it as being the case that [wealth mattering] is dependent on human coordination, and not trust others to coordinate like that. (In other words: the "stakeholders" would have to all agree to cede de facto power from themselves, to humanity.)
...not agree that humanity will or should treat wealth as not mattering; and instead intend to pursue a wealthy and powerful position mid-singularity, with the expectation of this strategy having large payoffs.
...be in some sort of mindbroken state (in the genre of Moral Mazes), such that they aren't really (say, in higher-order derivatives) modeling the connection between actions and long-term outcomes, and instead are, I don't know, doing something else, maybe involving arbitrary obeisance to power.

I don't know what's up with people, but I think it's potentially important to understand deeply what's up with people, without making whatever assumption goes into thinking that IF someone only became aware of this vision of the future, THEN they would adopt it.

(If Tammy responded that "realize" was supposed to mean the etymonic sense of "making real" then I'd have to concede.)

[-]ryan_greenblatt2y82

Instead someone might, for example...

Isn't the central one "you want to spend money to make a better long term future more likely, e.g. by donating it to fund AI safety work now"?

Fair enough if you think the marginal value of money is negligable, but this isn't exactly obvious.

[-]TsviBT2y76

That's another main possibility. I don't buy the reasoning in general though--integrity is just super valuable. (Separately I'm aware of projects that are very important and neglected (legibly so) without being funded, so I don't overall believe that there are a bunch of people strategically capitulating to anti-integrity systems in order to fund key projects.) Anyway, my main interest here is to say that there is a real, large-scale, ongoing problem(s) with the social world, which increases X-risk; it would be good for some people to think clearly about that; and it's not good to be satisfied with false / vague / superficial stories about what's happening.

[-]the gears to ascension2y1710

downvote and agree. but being financially ruined makes it harder to do other things, and it's probably pretty aversive to go through even if you expect things to turn out better in expectation because of it. the canaries thing seems pretty reasonable to me in light of this.

[-]Thomas Kwa2y70

I care about my wealth post-singularity and would be wiling to make bets consistent with this preference, e.g. I pay 1 share of QQQ now, you pay me 3 shares of QQQ 6 months after the world GDP has 10xed if we are not all dead then.

[-]RedMan2y-40

Based on your recent post here: https://www.lesswrong.com/posts/55rc6LJcqRmyaEr9T/please-stop-publishing-ideas-insights-research-about-ai

Can I mark you down as in favor of AI related NDAs? In your ideal world, would a perfect solution be for a single large company to hire all the capable AI researchers, give them aggressive non disclosure and non compete agreements, then shut down every part of the company except the legal department that enforces the agreements?

[-][anonymous]2y56

I'm a different person but I would support contracts which disallow spread of capabilities insights, but not contracts which disallow criticism of AI orgs (and especially not surprise ones).

IIUC the latter is what what the OAI-NonDisparagement-controversy has been about.

I'm not confident the following is true, but it seems to me that your first question was written under a belief that the controversy was about both of those at once. It seems like it was trying (under that world model) to 'axiomatically' elicit a belief in disagreement with an ongoing controversy, which would be non-truthseeking.

[-]the gears to ascension2y42

That seems like a misgeneralization, and I'd like to hear what thoughts you'd have depending on the various answers that could be given in the framework you raise. I'd imagine that there are a wide variety of possible ways a person could be limited in what they choose to say, and being threatened if they say things is a different situation than if they voluntarily do not: for example, the latter allows them to criticize.

[-]Jacob_Hilton2y2-50

We were especially alarmed to notice that the list contains at least 12 former employees currently working on AI policy, and 6 working on safety evaluations. This includes some in leadership positions, for example:

I don't really follow this reasoning. If anything, playing a leadership role in AI policy or safety evaluations will usually give you an additional reason not to publicly disparage AI companies, to avoid being seen as partisan, making being subject to such an agreement less of an issue. I would be pretty surprised if such people subject to these agreements felt particularly constrained in what they could say as part of their official duties, although if I am wrong about this then it does seem like quite a concerning thing to have happened. The obvious exception to this is if the role involves unofficial public commentary about labs, but it's not obvious to me that this has been a big part of the role of any of the people on your list, and even then, they may not have felt especially constrained, depending on the individual. It's also worth noting that several of these roles require the holder to give up or donate lab equity to avoid any conflict of interest, regardless of any non-disparagement agreements.

I suspect the crux may be our differing interpretations of the agreement. I'm not sure where your interpretation that it prohibits "taking any actions which might make the company less valuable" comes from, maybe you could highlight the part of the agreement you are basing that on.

[This comment is no longer endorsed by its author]Reply

[-]Amalthea2y4841

When you have a role in policy or safety, it may usually be a good idea not to voice strong opinions on any given company. If you nevertheless feel compelled to do so by circumstances, it's a big deal if you have personal incentives against that - especially if they're not disclosed.

[-]Jacob_Hilton2y13-19

Yeah I agree with this, and my original comment comes across too strongly upon re-reading. I wanted to point out some counter-considerations, but the comment ended up unbalanced. My overall view is:

It was highly inappropriate for the company to have been issuing these agreements so widely, especially using such aggressive tactics and without allowing disclosure of the agreement, given the technology that it is developing.
The more high-profile and credible a person is, the more damaging it is for this person to have been subject to the agreement.
Nevertheless, it is a mistake to think of potential "disparagement" as part of the job duties of most of the people mentioned, and the post appears to wildly misinterpret the meaning of this term as "taking any actions which might make the company less valuable".
Ultimately, it would have looked extremely bad for the company to enforce one of these agreements, so the primary effect of the contract comes down to how individuals felt that it constrained their behavior. We don't have great visibility into this. It's possible that some of these people felt quite constrained, and it's also possible that some of these people weren't even aware of the non-disparagement clause because they didn't notice it when they signed.
Thankfully, most of this is now moot as the company has retracted the contract. I should emphasize that there may remain some legal ambiguity and additional avenues for retaliation, but I am optimistic that these will be cleaned up in the near future. There will still be non-disparagement agreements in place in cases where "the non-disparagement provision was mutual" (in the words of the company), but my strong guess is that this refers only to the original Anthropic departures and perhaps a handful of other individuals who were high up at the company.
It remains important for people to disclose their financial interest in the company when appropriate, or in some cases give up this interest to avoid a conflict of interest.

Note: I have a financial interest in the company and was subject to one of these agreements until recently.

[-]habryka2y5956

Thankfully, most of this is now moot as the company has retracted the contract.

I don't think any of this is moot, since the thing that is IMO most concerning is people signing these contracts, then going into policy or leadership positions and not disclosing that they signed those contracts. Those things happened in the past and are real breaches of trust.

[-]William_S2y6-12

I imagine many of the people going into leadership positions were prepared to ignore the contract, or maybe even forgot about the nondisparagement clause altogether. The clause is also open to more avenues of legal attack if it's enforced against someone who takes another position which requires disparagement (e.g. if it's argued to be a restriction on engaging in business). And if any individual involved divested themselves of equity before taking up another position, there would be fewer ways for the company to retaliate against them. I don't think it's fair to view this as a serious breach of trust on behalf of any individual, without clear evidence that it impacted their decisions or communication.

But it is fair to view the situation overall as concerning that it could happen with nobody noticing, and try to design defenses to prevent this or similar things happening in the future, e.g. some clear statement around not having any conflict of interest including legal obligations for people going into independent leadership positions, as well as a consistent divestment policy (though that creates its own wierd incentives).

[-]aysja2y127

I imagine many of the people going into leadership positions were prepared to ignore the contract, or maybe even forgot about the nondisparagement clause

I could imagine it being the case that people are prepared to ignore the contract. But unless they publicly state that, it wouldn’t ameliorate my concerns—otherwise how is anyone supposed to trust they will?

The clause is also open to more avenues of legal attack if it's enforced against someone who takes another position which requires disparagement (e.g. if it's argued to be a restriction on engaging in business).

That seems plausible, but even if this does increase the likelihood that they’d win a legal battle, legal battles still pose huge risk and cost. This still seems like a meaningful deterrent.

I don't think it's fair to view this as a serious breach of trust on behalf of any individual, without clear evidence that it impacted their decisions or communication.

But how could we even get this evidence? If they’re bound to the agreement their actions just look like an absence of saying disparaging things about OpenAI, or of otherwise damaging their finances or reputation. And it’s hard to tell, from the outside, whether this is a reflection of an obligation, or of a genuine stance. Positions of public responsibility require public trust, and the public doesn’t have access to the inner workings of these people’s minds. So I think it’s reasonable, upon finding out that someone has a huge and previously-undisclosed conflict of interest, to assume that might be influencing their behavior.

[-]William_S2y5-1

Evidence could look like 1. Someone was in a position where they had to make a judgement about OpenAI and was in a position of trust 2. They said something bland and inoffensive about OpenAI 3. Later, independently you find that they likely would have known about something bad that they likely weren't saying because of the nondisparagement agreement (instead of ordinary confidentially agreements).

This requires some model of "this specific statement was influenced by the agreement" instead of just "you never said anything bad about OpenAI because you never gave opinions on OpenAI".

I think one should require this kind of positive evidence before calling it a "serious breach of trust", but people can make their own judgement about where that bar should be.

[-]Adam Scholl2y63

I agree, but I also doubt the contract even has been widely retracted. Why do you think it has, Jacob? Quite few people have reported being released so far.

[-]KPier2y645

(This is Kelsey Piper). I am quite confident the contract has been widely retracted. The overwhelming majority of people who received an email did not make an immediate public comment. I am unaware of any people who signed the agreement after 2019 and did not receive the email, outside cases where the nondisparagement agreement was mutual (which includes Sutskever and likely also Anthropic leadership). In every case I am aware of, people who signed before 2019 did not reliably receive an email but were reliably able to get released if they emailed OpenAI HR.

If you signed such an agreement and have not been released, you can of course contact me on Signal: 303 261 2769.

[-]Adam Scholl2y110

I am quite confident the contract has been widely retracted.

Can you share your reasons for thinking this? Given that people who remain bound can’t say so, I feel hesitant to conclude that people aren’t without clear evidence.

I am unaware of any people who signed the agreement after 2019 and did not receive the email, outside cases where the nondisparagement agreement was mutual (which includes Sutskever and likely also Anthropic leadership).

Excepting Jack Clark (who works for Anthropic) and Remco Zwetsloot (who left in 2018), I would think all the policy leadership folks listed above meet these criteria, yet none have reported being released. Would you guess that they have been?

[-]KPier2y357

I have been in touch with around a half dozen former OpenAI employees who I spoke to before former employees were released and all of them later informed me they were released, and they were not in any identifiable reference class such that I’d expect OpenAI would have been able to selectively release them while not releasing most people. I have further been in touch with many other former employees since they were released who confirmed this. I have not heard from anyone who wasn’t released, and I think it is reasonably likely I would have heard from them anonymously on Signal. Also, not releasing a bunch of people after saying they would seems like an enormously unpopular, hard to keep secret, and not very advantageous move for OpenAI, which is already taking a lot of flak for this. I also have a model of how people choose whether or not to make public statements where it’s extremely unsurprising most people would not choose to do so.

I would indeed guess that all of the people you listed have been released if they were even subject to such agreements in the first place, which I do not know (and the fact Geoffrey Irving was not offered such an agreement is some basis to think they were not uniformly imposed during some of the relevant time periods, imo.)

[-]Adam Scholl2y*217

Thanks, that's helpful context.

I also have a model of how people choose whether or not to make public statements where it’s extremely unsurprising most people would not choose to do so.

I agree it's unsurprising that few rank-and-file employees would make statements, but I am surprised by the silence from those in policy/evals roles. From my perspective, active non-disparagement obligations seem clearly disqualifying for most such roles, so I'd think they'd want to clarify.

[-]chanamessinger2y925

It sounds from this back and forth like we should assume that Anthropic leadership who left from OAI (so Dario and Daniela Amodei, Jack Clark, Sam McCandlish, others?) are still under NDA because it was probably mutual. Does that sound right to others?

[-]aysja2y78

I have not heard from anyone who wasn’t released, and I think it is reasonably likely I would have heard from them anonymously on Signal. Also, not releasing a bunch of people after saying they would seems like an enormously unpopular, hard to keep secret, and not very advantageous move for OpenAI, which is already taking a lot of flak for this.

I’m not necessarily imagining that OpenAI failed to release a bunch of people, although that still seems possible to me. I’m more concerned that they haven’t released many key people, and while I agree that you might have received an anonymous Signal message to that effect if it were true, I still feel alarmed that many of these people haven’t publicly stated otherwise.

I also have a model of how people choose whether or not to make public statements where it’s extremely unsurprising most people would not choose to do so.

I do find this surprising. Many people are aware of who former OpenAI employees are, and hence are aware of who was (or is) bound by this agreement. At the very least, if I were in this position, I would want people to know that I was no longer bound. And it does seem strange to me, if the contract has been widely retracted, that so few prominent people have confirmed being released.

It also seems pretty important to figure out who is under mutual non-disparagement agreements with OpenAI, which would still (imo) pose a problem if it applied to anyone in safety evaluations or policy positions.

[-]Jacob_Hilton2y11-2

See the statement from OpenAI in this article:

We're removing nondisparagement clauses from our standard departure paperwork, and we're releasing former employees from existing nondisparagement obligations unless the nondisparagement provision was mutual. We'll communicate this message to former employees.

They have communicated this to me and I believe I was in the same category as most former employees.

I think the main reasons so few people have mentioned this are:

As I mentioned, there is still some legal ambiguity and additional avenues for retaliation
Some people are taking their time over what they want to say
Most people don't want to publicly associate themselves with a controversial situation
Most people aren't inclined to disparage their former employer anyway, and so they may not think of their own situation as that big of a deal

[-]Adam Scholl2y11-1

the post appears to wildly misinterpret the meaning of this term as "taking any actions which might make the company less valuable"

I'm not a lawyer, and I may be misinterpreting the non-interference provision—certainly I'm willing to update the post if so! But upon further googling, my current understanding is still that in contracts, "interference" typically means "anything that disrupts, damages or impairs business."

And the provision in the OpenAI offboarding agreement is written so broadly—"Employee agrees not to interfere with OpenAI’s relationship with current or prospective employees, current or previous founders, portfolio companies, suppliers, vendors or investors"—that I assumed it was meant to encompass essentially all business impact, including e.g. the company's valuation.

[-]Thrasymachus2y1613

I see the concerns as these:

The four corners of the agreement seem to define 'disparagement' broadly, so one might reasonably fear (e.g.) "First author on an eval especially critical of OpenAI versus its competitors", or "Policy document highly critical of OpenAI leadership decisions" might 'count'.
Given Altman's/OpenAI's vindictiveness and duplicity, and the previous 'safeguards' (from their perspective) which give them all the cards in terms of folks being able to realise the value of their equity, "They will screw me out of a lot of money if I do something they really don't like (regardless of whether it 'counts' per the non-disparagement agreement)" seems a credible fear.
1. It appears Altman tried to get Toner kicked off the board for being critical of OpenAI in a policy piece, after all.
This is indeed moot for roles which require equity to be surrendered anyway. I'd guess most roles outside government (and maybe some within it) do not have such requirements. A conflict of interest roughly along the lines of the first two points makes impartial performance difficult, and credible impartial performance impossible (i.e. even if indeed Alice can truthfully swear "My being subject to such an agreement has never influenced my work in AI policy", reasonable third parties would be unwise to believe her).
The 'non-disclosure of non-disparagement' makes this worse, as it interferes with this conflict of interest being fully disclosed. "Alice has a bunch of OpenAI equity" is one thing, "Alice has a bunch of OpenAI equity, and has agreed to be beholden to them in various ways to keep it" is another. We would want to know the latter to critically appraise Alice's work whenever it is relevant to OpenAI's interests (and I would guess a lot of policy/eval/reg/etc. would be sufficiently relevant that we'd like to contemplate whether Alice's commitments colour her position). Yet Alice has also promised to keep these extra relevant details secret.

^{^}

You can read the full documents at the bottom of Kelsey Piper’s excellent report, but here are some key excerpts:

Non-Disclosure: “Employee agrees that Employee will now and forever keep the terms and monetary settlement amount of this Agreement completely confidential, and that Employee shall not disclose such to any other person directly or indirectly.”

Liability: “Employee agrees that the failure to comply with... the confidentiality, non-disparagement, non-competition, and non-solicitation obligations set forth in this Agreement shall amount to a material breach of this Agreement which will subject Employee to the liability for all damages OpenAI might incur.”

Non-Interference: “Employee agrees not to interfere with OpenAI’s relationship with current or prospective employees, current or previous founders, portfolio companies, suppliers, vendors or investors. Employee also agrees to refrain from communicating any disparaging, defamatory, libelous, or derogatory statements, in a manner reasonably calculated to harm OpenAI’s reputation, to any third party regarding OpenAI or any of the other Releasees.”

^{^}

Thank you to AI Watch for providing some of this data.

^{^}

In total there are 7 people who have publicly reported not being subject to the terms. Daniel Kokotajlo was offered the agreement but didn't sign; Gretchen Krueger, Cullen O'Keefe, and Evan Hubinger are not subject to the agreement, either because they didn't sign it or because it wasn't offered to them.

^{^}

Assuming former board members were expected to sign similar agreements, Helen Toner (Director of Strategy, Center for Security and Emerging Technology) may be subject to non-disparagement as well; Holden Karnofsky (Visiting Scholar, Carnegie Endowment for International Peace) confirms that he didn't sign.

^{^}

Edited to remove Chris Painter (Head of Policy, METR), Geoffrey Irving (Research Director, UK AI Safety Institute), and Remco Zwetsloot (Executive Director, Horizon Institute for Public Service), who report not signing the agreement; and Beth Barnes (Head of Research, METR), who reports being recently released.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

290

Non-Disparagement Canaries for OpenAI

290

290