Idea: Daniel Kokotajlo probably lost quite a bit of money by not signing an OpenAI NDA before leaving, which I consider a public service at this point. Could some of the funders of the AI safety landscape give some money or social reward for this?
I guess reimbursing everything Daniel lost might be a bit too much for funders but providing some money, both to reward the act and incentivize future safety people to not sign NDAs would have a very high value.
@Daniel Kokotajlo If you indeed avoided signing an NDA, would you be able to share how much you passed up as a result of that? I might indeed want to create a precedent here and maybe try to fundraise for some substantial fraction of it.
To clarify: I did sign something when I joined the company, so I'm still not completely free to speak (still under confidentiality obligations). But I didn't take on any additional obligations when I left.
Unclear how to value the equity I gave up, but it probably would have been about 85% of my family's net worth at least. But we are doing fine, please don't worry about us.
Is that your family's net worth is $100 and you gave up $85? Or your family's net worth is $15 and you gave up $85?
Either way, hats off!
The latter. Yeah idk whether the sacrifice was worth it but thanks for the support. Basically I wanted to retain my ability to criticize the company in the future. I'm not sure what I'd want to say yet though & I'm a bit scared of media attention.
I think having signed an NDA (and especially a non-disparagement agreement) from a major capabilities company should probably rule you out of any kind of leadership position in AI Safety, and especially any kind of policy position. Given that I think Daniel has a pretty decent chance of doing either or both of these things, and that work is very valuable and constrained on the kind of person that Daniel is, I would be very surprised if this wasn't worth it on altruistic grounds.
Edit: As Buck points out, different non-disclosure-agreements can differ hugely in scope. To be clear, I think non-disclosure-agreements that cover specific data or information you were given seems fine, but non-disclosure-agreements that cover their own existence, or that are very broadly worded and prevent you from basically talking about anything related to an organization, are pretty bad. My sense is the stuff that OpenAI employees are asked to sign when they leave are very constraining, but my guess is the kind of stuff that people have to sign for a small amount of contract work or for events are not very constraining, though I would definitely read any contract carefully in this space.
Strong disagree re signing non-disclosure agreements (which I'll abbreviate as NDAs). I think it's totally reasonable to sign NDAs with organizations; they don't restrict your ability to talk about things you learned other ways than through the ways covered by the NDA. And it's totally standard to sign NDAs when working with organizations. I've signed OpenAI NDAs at least three times, I think--once when I worked there for a month, once when I went to an event they were running, once when I visited their office to give a talk.
I think non-disparagement agreements are way more problematic. At the very least, signing secret non-disparagement agreements should probably disqualify you from roles where your silence re an org might be interpreted as a positive sign.
My understanding is that the extent of NDAs can differ a lot between different implementations, so it might be hard to speak in generalities here. From the revealed behavior of people I poked here who have worked at OpenAI full-time, the OpenAI NDAs seem very comprehensive and limiting. My guess is also the NDAs for contractors and for events are a very different beast and much less limiting.
Also just the de-facto result of signing non-disclosure-agreements is that people don't feel comfortable navigating the legal ambiguity and default very strongly to not sharing approximately any information about the organization at all.
Maybe people would do better things here with more legal guidance, and I agree that you don't generally seem super constrained in what you feel comfortable saying, but like I sure now have run into lots of people who seem constrained by NDAs they signed (even without any non-disparagement component). Also, if the NDA has a gag clause that covers the existence of the agreement, there is no way to verify the extent of the NDA, and that makes navigating this kind of stuff super hard and also majorly contributes to people avoiding the topic completely.
It might be a good on the current margin to have a norm of publicly listing any non-disclosure agreements you have signed (e.g. on one's LW profile), and the rough scope of them, so that other people can model what information you're committed to not sharing, and highlight if it is related to anything beyond the details of technical research being done (e.g. if it is about social relationships or conflicts or criticism).
I have added the one NDA that I have signed to my profile.
But everyone has lots of duties to keep secrets or preserve privacy and the ones put in writing often aren't the most important. (E.g. in your case.)
I've signed ~3 NDAs. Most of them are irrelevant now and useless for people to know about, like yours.
I agree in special cases it would be good to flag such things — like agreements to not share your opinions on a person/org/topic, rather than just keeping trade secrets private.
My current best guess is that actually cashing out the vested equity is tied to an NDA, but I am really not confident. OpenAI has a bunch of really weird equity arrangements.
I might indeed want to create a precedent here and maybe try to fundraise for some substantial fraction of it.
I wonder if it might be more effective to fund legal action against OpenAI than to compensate individual ex-employees for refusing to sign an NDA. Trying to take vested equity away from ex-employees who refuse to sign an NDA sounds likely to not hold up in court, and if we can establish a legal precident that OpenAI cannot do this, that might make other ex-employees much more comfortable speaking out against OpenAI than the possibility that third-parties might fundraise to partially compensate them for lost equity would be (a possibility you might not even be able to make every ex-employee aware of). The fact that this would avoid financially rewarding OpenAI for bad behavior is also a plus. Of course, legal action is expensive, but so is the value of the equity that former OpenAI employees have on the line.
Yeah, at the time I didn't know how shady some of the contracts here were. I do think funding a legal defense is a marginally better use of funds (though my guess is funding both is worth it).
I'm not gonna lie, I'm pretty crazily happy that a random quick take I wrote 10m on a Friday morning about how Daniel Kokotajlo should get social reward and get partial refunding sparked a discussion that seems to have caused positive effects wayyyy beyond expectations.
Quick takes is an awesome innovation, it allows to post even when one is still partially confused/uncertain about sthg. Given the confusing details of the situation in that case, this wd pbbly not have happened otherwise.
Idk what the LW community can do but somehow, to the extent we think liberalism is valuable, the Western democracies need to urgently put a hard stop to Russia and China war (preparation) efforts. I fear that rearmament is a key component of the only viable path at this stage.
I won't argue in details here but link to Noahpinion, who's been quite vocal on those topics. The TLDR is that China and Russia have been scaling their war industry preparation efforts for years, while Western democracies industries keep declining and remain crazily dependent from the Chinese industry. This creates a new global equilibrium where the US is no longer powerful enough to disincentivize all authoritarians regime from grabbing more land etc.
Some readings relevant to that:
https://www.noahpinion.blog/p/were-not-ready-for-the-big-one
Why Putin probably won't stop with Ukraine: https://en.m.wikipedia.org/wiki/Minsk_agreements
Western democracies current arsenal (centered around some very expensive units, like aircraft carriers) is not well fit for the modern im
Something which concerns me is that transformative AI will likely be a powerful destabilizing force, which will place countries currently behind in AI development (e.g. Russia and China) in a difficult position. Their governments are currently in the position of seeing that peacefully adhering to the status quo may lead to rapid disempowerment, and that the potential for coercive action to interfere with disempowerment is high. It is pretty clearly easier and cheaper to destroy chip fabs than create them, easier to kill tech employees with potent engineering skills than to train new ones.
I agree that conditions of war make safe transitions to AGI harder, make people more likely to accept higher risk. I don't see what to do about the fact that the development of AI power is itself presenting pressures towards war. This seems bad. I don't know what I can do to make the situation better though.
Defending liberal democracy is complex, because everyone wants to say that they are on the side of liberal democracy.
If you take the Verified Voting Foundation as one of the examples of highly recommended projects in the link, mainstream opinion these days is probably that their talking points are problematic because people might trust less in elections when the foundations speaks about the need for a more trustworthy election process.
While I personally believe that pushing for a more secure voting system is good, it's a complex situation and many other projects in the space are similar. It's easy for a project that's funded for the purpose of strengthening liberal democracy to do the opposite.
Lighthaven City for 6.6M€? Worth a look by the Lightcone team.
https://x.com/zillowgonewild/status/1793726646425460738?t=zoFVs5LOYdSRdOXkKLGh4w&s=19
Glad you're keeping your eye out for these things!
It's 8 hours away from the Bay, which all-in is not that different from a plane flight to NY from the Bay, so the location doesn't really help with being where all the smart and interesting people are.
Before we started the Lightcone Offices we did a bunch of interviews to see if all the folks in the bay-area x-risk scene would click a button to move to the Presidio District in SF (i.e. imagine Lightcone team packs all your stuff and moves it for you and also all these other people in the scene move too) and IIRC most wouldn't because of things like their extended friend network and partners and so on (@habryka @jacobjacob am I remembering that correctly?). And that's only a ~1.5-hr move for most of them.
Given the recent argument on whether Anthropic really did commit to not push the frontier or just misled most people into thinking that it was the case, it's relevant to reread the RSPs in hairsplitting mode. I was rereading the RSPs and noticed a few relevant findings:
Disclaimer: this is focused on negative stuff but does not deny the merits of RSPs etc etc.
I currently think Anthropic didn't "explicitly publicly commit" to not advance the rate of capabilities progress. But, I do think they made deceptive statements about it, and when I complain about Anthropic I am complaining about deception, not "failing to uphold literal commitments."
I'm not talking about the RSPs because the writing and conversations I'm talking about came before that. I agree that the RSP is more likely to be a good predictor of what they'll actually do.
I think most of the generator for this was more like "in person conversations", at least one of which was between Dario and Dustin Moswkowitz:
The most explicit public statement I know is from this blogpost (which I agree is not an explicit commitment, but, I do think
...
- Capabilities: AI research aimed at making AI systems generally better at any sort of task, including writing, image processing or generation, game playing, etc. Research that makes large language models more efficient, or that improves reinforcement learning algorithms, would fall under this heading. Capabilities work generates and improves on the models that we investigate and utilize in our alignment research. We generally don’t publish this ki
You are a LessWrong reader, want to push humanity's wisdom and don't know how to do so? Here's a workflow:
See an application of the workflow here: https://www.lesswrong.com/posts/epgCXiv3Yy3qgcsys/you-can-t-predict-a-game-of-pinball?commentId=wjLFhiWWacByqyu6a
do people have takes on the most useful metrics/KPIs that could give a sense of how good are the monitoring/anti-misuse measures on APIs?
Some ideas:
a) average time to close an account conducting misuse activities (my sense is that as long as this is >1 day, there's little chance to avoid that state actors use API-based models for a lot of misuse (everything which doesn't require major scale))
b) the logs of the 5 accounts/interactions that have been ranked as highest severity (my sense is that incident reporting like OpenAI/Microsoft have done on c...
Playing catch-up is way easier than pushing the frontier of LLM research. One is about guessing which path others took, the other one is about carving a path among all the possible ideas that could work.
If China stopped having access to US LLM secrets and had to push the LLM frontier rather than playing catch up, how slower would it be at doing so?
My guess is at least >2x and probably more but I'd be curious to get takes.
I've been thinking a lot recently about taxonomizing AI risk related concepts to reduce the dimensionality of AI threat modelling while remaining quite comprehensive. It's in the context of developing categories to assess whether labs plans cover various areas of risk.
There are two questions I'd like to get takes on. Any take on one of these 2 wd be very valuable.
There's a number of properties of AI systems that makes it easier to collect information in a safe way about those systems and hence demonstrate their safety: interpretability, formal verifiability, modularity etc. Which adjective wd you use to characterize those properties?
I'm thinking of "resilience" because from the perspective of an AI developer it helps a lot understanding the risk profile, but do you have other suggestions?
Some alternatives: